2024-08-09 12:33:18,283 INFO [train_multi_KD3.py:1187] (3/4) Training started 2024-08-09 12:33:18,284 INFO [train_multi_KD3.py:1197] (3/4) Device: cuda:3 2024-08-09 12:33:18,288 INFO [train_multi_KD3.py:1212] (3/4) Using dtype=torch.bfloat16 2024-08-09 12:33:18,289 INFO [train_multi_KD3.py:1214] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-09 12:33:18,289 INFO [train_multi_KD3.py:1216] (3/4) About to create model 2024-08-09 12:33:18,700 INFO [model_shift.py:142] (3/4) Delta_t: 6 when computing the distillation loss 2024-08-09 12:33:18,704 INFO [train_multi_KD3.py:1220] (3/4) Number of model parameters: 66484678 2024-08-09 12:33:20,625 INFO [train_multi_KD3.py:1235] (3/4) Using DDP 2024-08-09 12:33:22,053 INFO [kd_datamodule.py:690] (3/4) About to get train 960 cuts 2024-08-09 12:33:22,113 INFO [train_multi_KD3.py:1306] (3/4) Getting audioset cuts 2024-08-09 12:33:22,113 INFO [kd_datamodule.py:900] (3/4) About to get the audioset cuts for KD. 2024-08-09 12:33:22,119 INFO [kd_datamodule.py:869] (3/4) About to get the voxceleb cuts. 2024-08-09 12:33:22,121 INFO [kd_datamodule.py:880] (3/4) Adding voxceleb2 cuts. 2024-08-09 12:33:22,123 INFO [train_multi_KD3.py:1320] (3/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-09 12:33:30,768 INFO [train_multi_KD3.py:1322] (3/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-09 12:33:30,768 INFO [train_multi_KD3.py:1323] (3/4) Using weights: [1406195, 1904746, 1187704] 2024-08-09 12:33:30,768 INFO [train_multi_KD3.py:1332] (3/4) CutSet(len=4498645) [underlying data type: ] 2024-08-09 12:33:30,768 INFO [kd_datamodule.py:449] (3/4) Disable MUSAN 2024-08-09 12:33:30,770 INFO [kd_datamodule.py:489] (3/4) Disable SpecAugment 2024-08-09 12:33:30,771 INFO [kd_datamodule.py:491] (3/4) About to create train dataset 2024-08-09 12:33:30,773 INFO [kd_datamodule.py:528] (3/4) Using SimpleCutSampler 2024-08-09 12:33:30,774 INFO [kd_datamodule.py:536] (3/4) About to create train dataloader 2024-08-09 12:33:30,776 INFO [kd_datamodule.py:763] (3/4) About to get dev-clean cuts 2024-08-09 12:33:30,778 INFO [kd_datamodule.py:781] (3/4) About to get dev-other cuts 2024-08-09 12:33:30,780 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-09 12:33:31,031 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-09 12:33:31,032 INFO [kd_datamodule.py:840] (3/4) About to get the test set of voxceleb1 set. 2024-08-09 12:33:31,038 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-09 12:33:31,285 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-09 12:33:31,286 INFO [kd_datamodule.py:912] (3/4) About to get the audioset eval cuts. 2024-08-09 12:33:31,288 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-09 12:33:31,890 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-09 12:33:31,890 INFO [train_multi_KD3.py:1412] (3/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-09 12:33:47,518 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 0, loss[loss=1.014, beats_loss=0.6663, ecapa_loss=0.002298, whisper_loss=0.3247, over 15561.00 frames. ], tot_loss[loss=1.014, beats_loss=0.6663, ecapa_loss=0.002298, whisper_loss=0.3247, over 15561.00 frames. ], batch size: 58, lr: 2.25e-02, grad_scale: 2.0 2024-08-09 12:33:47,519 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 12:34:33,700 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on ASR_libri: loss=0.9193, beats_loss=0, ecapa_loss=0.006113, whisper_loss=0.8581, over 922467.00 frames. 2024-08-09 12:34:46,727 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8035, 4.2337, 4.7264, 4.7816], device='cuda:3') 2024-08-09 12:34:48,304 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on SV_voxceleb1: loss=0.05055, beats_loss=0, ecapa_loss=0.005055, whisper_loss=0, over 939242.00 frames. 2024-08-09 12:35:24,616 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.9095, 3.8552, 3.8776, 3.9143], device='cuda:3') 2024-08-09 12:36:59,577 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on AT_audioset: loss=1.752, beats_loss=1.752, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 12:36:59,579 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 12:37:13,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=0.0, ans=0.2 2024-08-09 12:37:21,203 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 12:37:22,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=0.0, ans=0.5 2024-08-09 12:37:29,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=48.30 vs. limit=7.5375 2024-08-09 12:37:50,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=427.81 vs. limit=7.5375 2024-08-09 12:37:50,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=11.82 vs. limit=5.025 2024-08-09 12:37:54,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=4.08 2024-08-09 12:37:56,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=508.51 vs. limit=7.65 2024-08-09 12:37:58,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200.0, ans=0.475 2024-08-09 12:38:01,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=44.81 vs. limit=7.575 2024-08-09 12:38:05,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=200.0, ans=7.575 2024-08-09 12:38:10,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=48.65 vs. limit=7.575 2024-08-09 12:38:12,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=38.96 vs. limit=7.575 2024-08-09 12:38:15,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=200.0, ans=0.248 2024-08-09 12:38:19,659 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 12:38:24,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=300.0, ans=0.8895000000000001 2024-08-09 12:38:27,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=375.05 vs. limit=7.725 2024-08-09 12:38:29,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=24.06 vs. limit=7.6125 2024-08-09 12:38:32,345 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 12:38:36,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=48.93 vs. limit=7.725 2024-08-09 12:38:49,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=191.24 vs. limit=7.8 2024-08-09 12:38:59,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=400.0, ans=0.48125 2024-08-09 12:39:04,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 50, loss[loss=0.2441, beats_loss=0.01655, ecapa_loss=0.001902, whisper_loss=0.2085, over 24805.00 frames. ], tot_loss[loss=0.3382, beats_loss=0.1327, ecapa_loss=0.001863, whisper_loss=0.1869, over 871332.11 frames. ], batch size: 91, lr: 2.48e-02, grad_scale: 2.0 2024-08-09 12:39:15,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=500.0, ans=0.4765625 2024-08-09 12:39:17,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=500.0, ans=0.4765625 2024-08-09 12:39:18,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=222.39 vs. limit=7.6875 2024-08-09 12:39:22,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=11.31 vs. limit=5.125 2024-08-09 12:39:28,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.78 vs. limit=3.09 2024-08-09 12:39:39,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=342.15 vs. limit=7.95 2024-08-09 12:39:44,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=39.04 vs. limit=7.725 2024-08-09 12:39:46,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=30.85 vs. limit=7.725 2024-08-09 12:39:54,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=700.0, ans=0.293 2024-08-09 12:39:56,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=31.18 vs. limit=4.28 2024-08-09 12:39:58,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=46.00 vs. limit=8.025 2024-08-09 12:40:10,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=4.32 2024-08-09 12:40:14,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=326.46 vs. limit=7.8 2024-08-09 12:40:22,791 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 12:40:23,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=314.33 vs. limit=8.1 2024-08-09 12:40:26,411 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 12:40:33,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=900.0, ans=0.07975 2024-08-09 12:40:33,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=470.94 vs. limit=7.8375 2024-08-09 12:40:39,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=283.98 vs. limit=5.45 2024-08-09 12:40:46,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=304.78 vs. limit=7.8375 2024-08-09 12:40:48,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=324.56 vs. limit=7.8375 2024-08-09 12:40:48,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=142.08 vs. limit=7.8375 2024-08-09 12:40:51,515 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.930e+01 4.445e+01 8.118e+01 2.890e+03, threshold=8.890e+01, percent-clipped=0.0 2024-08-09 12:40:51,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 100, loss[loss=0.2127, beats_loss=0.02075, ecapa_loss=0.001795, whisper_loss=0.174, over 19197.00 frames. ], tot_loss[loss=0.2658, beats_loss=0.07014, ecapa_loss=0.00187, whisper_loss=0.177, over 1543840.58 frames. ], batch size: 73, lr: 2.70e-02, grad_scale: 4.0 2024-08-09 12:40:58,951 WARNING [optim.py:496] (3/4) Scaling gradients by 0.048358626663684845, model_norm_threshold=88.8975601196289 2024-08-09 12:40:59,137 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.2.norm.log_scale with proportion 0.88, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.987e+06, grad_sumsq=2.987e+06, orig_rms_sq=1.000e+00 2024-08-09 12:41:00,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=49.57 vs. limit=8.25 2024-08-09 12:41:04,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=296.06 vs. limit=7.875 2024-08-09 12:41:06,820 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 34 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 12:41:08,727 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 12:41:11,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.05 vs. limit=5.275 2024-08-09 12:41:14,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=220.25 vs. limit=7.9125 2024-08-09 12:41:18,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=290.59 vs. limit=7.9125 2024-08-09 12:41:20,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=222.93 vs. limit=7.9125 2024-08-09 12:41:30,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1200.0, ans=0.04625 2024-08-09 12:41:30,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=253.10 vs. limit=8.4 2024-08-09 12:41:32,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=205.10 vs. limit=7.95 2024-08-09 12:41:33,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1200.0, ans=0.07300000000000001 2024-08-09 12:41:34,604 WARNING [optim.py:496] (3/4) Scaling gradients by 0.011974900029599667, model_norm_threshold=88.8975601196289 2024-08-09 12:41:34,782 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.96, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.313e+07, grad_sumsq=5.313e+07, orig_rms_sq=1.000e+00 2024-08-09 12:41:40,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=191.21 vs. limit=7.95 2024-08-09 12:41:41,662 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-09 12:41:43,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1300.0, ans=0.4390625 2024-08-09 12:41:45,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=221.20 vs. limit=7.9875 2024-08-09 12:41:48,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=251.22 vs. limit=8.475 2024-08-09 12:41:48,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=435.55 vs. limit=7.9875 2024-08-09 12:41:59,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=117.08 vs. limit=8.025 2024-08-09 12:42:14,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=362.76 vs. limit=8.625 2024-08-09 12:42:15,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=382.43 vs. limit=8.0625 2024-08-09 12:42:15,736 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 150, loss[loss=0.23, beats_loss=0.01662, ecapa_loss=0.001932, whisper_loss=0.194, over 21417.00 frames. ], tot_loss[loss=0.2401, beats_loss=0.04966, ecapa_loss=0.001877, whisper_loss=0.1717, over 2064305.91 frames. ], batch size: 83, lr: 2.93e-02, grad_scale: 4.0 2024-08-09 12:42:18,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=323.90 vs. limit=8.0625 2024-08-09 12:42:21,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=217.28 vs. limit=5.75 2024-08-09 12:42:22,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=206.78 vs. limit=5.75 2024-08-09 12:42:28,161 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04562794789671898, model_norm_threshold=88.8975601196289 2024-08-09 12:42:28,347 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.64, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.426e+06, grad_sumsq=2.426e+06, orig_rms_sq=1.000e+00 2024-08-09 12:42:30,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1600.0, ans=0.035 2024-08-09 12:42:43,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=324.44 vs. limit=8.1 2024-08-09 12:42:46,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1700.0, ans=0.8405 2024-08-09 12:42:46,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=163.38 vs. limit=5.85 2024-08-09 12:42:53,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=296.81 vs. limit=8.1375 2024-08-09 12:42:53,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=28.76 vs. limit=5.425 2024-08-09 12:42:55,580 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 12:42:57,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1700.0, ans=0.283 2024-08-09 12:43:02,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1800.0, ans=0.415625 2024-08-09 12:43:03,322 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-09 12:43:07,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=239.08 vs. limit=8.175 2024-08-09 12:43:12,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1800.0, ans=0.837 2024-08-09 12:43:20,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=89.72 vs. limit=8.2125 2024-08-09 12:43:20,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=129.10 vs. limit=5.95 2024-08-09 12:43:21,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=120.10 vs. limit=8.2125 2024-08-09 12:43:29,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1900.0, ans=6.1875 2024-08-09 12:43:31,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=225.31 vs. limit=8.925 2024-08-09 12:43:34,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+01 2.774e+01 3.640e+01 5.016e+01 7.424e+03, threshold=7.280e+01, percent-clipped=13.0 2024-08-09 12:43:34,342 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 200, loss[loss=0.254, beats_loss=0.01861, ecapa_loss=0.001646, whisper_loss=0.2189, over 20005.00 frames. ], tot_loss[loss=0.2287, beats_loss=0.0389, ecapa_loss=0.001873, whisper_loss=0.171, over 2477992.79 frames. ], batch size: 76, lr: 3.15e-02, grad_scale: 8.0 2024-08-09 12:43:38,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=31.32 vs. limit=5.5 2024-08-09 12:43:40,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=228.69 vs. limit=8.25 2024-08-09 12:43:41,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2000.0, ans=0.27999999999999997 2024-08-09 12:43:41,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=249.78 vs. limit=8.25 2024-08-09 12:43:47,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=4.8 2024-08-09 12:43:51,614 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 12:43:52,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=178.83 vs. limit=8.2875 2024-08-09 12:43:53,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=85.71 vs. limit=8.2875 2024-08-09 12:43:54,408 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06407187134027481, model_norm_threshold=72.79639434814453 2024-08-09 12:43:54,571 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.47, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.083e+05, grad_sumsq=6.083e+05, orig_rms_sq=1.000e+00 2024-08-09 12:44:06,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=236.39 vs. limit=9.15 2024-08-09 12:44:07,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=112.77 vs. limit=8.325 2024-08-09 12:44:08,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=143.59 vs. limit=9.15 2024-08-09 12:44:13,544 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 12:44:13,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2200.0, ans=0.396875 2024-08-09 12:44:20,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=247.61 vs. limit=9.225 2024-08-09 12:44:23,800 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=85.90 vs. limit=8.3625 2024-08-09 12:44:25,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2300.0, ans=0.27699999999999997 2024-08-09 12:44:26,254 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 12:44:29,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2300.0, ans=0.04825 2024-08-09 12:44:36,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=26.41 vs. limit=6.2 2024-08-09 12:44:40,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=68.93 vs. limit=8.4 2024-08-09 12:44:42,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2400.0, ans=0.11 2024-08-09 12:44:43,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=264.52 vs. limit=8.4 2024-08-09 12:44:43,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=31.92 vs. limit=9.3 2024-08-09 12:44:47,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2400.0, ans=0.774 2024-08-09 12:44:49,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=215.43 vs. limit=8.4 2024-08-09 12:44:52,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 250, loss[loss=0.236, beats_loss=0.01235, ecapa_loss=0.001913, whisper_loss=0.2045, over 18644.00 frames. ], tot_loss[loss=0.2178, beats_loss=0.03312, ecapa_loss=0.001846, whisper_loss=0.1662, over 2762979.41 frames. ], batch size: 70, lr: 3.38e-02, grad_scale: 8.0 2024-08-09 12:44:59,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=125.19 vs. limit=8.4375 2024-08-09 12:45:02,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2500.0, ans=0.176875 2024-08-09 12:45:03,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=206.00 vs. limit=9.375 2024-08-09 12:45:08,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2600.0, ans=0.23900000000000002 2024-08-09 12:45:10,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=25.01 vs. limit=5.65 2024-08-09 12:45:13,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2600.0, ans=0.27399999999999997 2024-08-09 12:45:15,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=86.20 vs. limit=6.3 2024-08-09 12:45:18,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=176.66 vs. limit=9.45 2024-08-09 12:45:18,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=284.05 vs. limit=8.475 2024-08-09 12:45:26,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=5.08 2024-08-09 12:45:32,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=111.15 vs. limit=9.525 2024-08-09 12:45:43,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2800.0, ans=8.55 2024-08-09 12:45:45,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2800.0, ans=0.36875 2024-08-09 12:45:51,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2800.0, ans=8.55 2024-08-09 12:45:55,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=129.93 vs. limit=8.5875 2024-08-09 12:45:57,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2900.0, ans=0.1375 2024-08-09 12:45:58,551 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 12:45:58,919 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=6.649e+00 2024-08-09 12:45:59,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2900.0, ans=0.3640625 2024-08-09 12:45:59,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=21.52 vs. limit=8.5875 2024-08-09 12:46:10,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.536e+01 4.623e+01 6.113e+01 1.136e+03, threshold=9.245e+01, percent-clipped=13.0 2024-08-09 12:46:10,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 300, loss[loss=0.1787, beats_loss=0.02123, ecapa_loss=0.001734, whisper_loss=0.1401, over 18044.00 frames. ], tot_loss[loss=0.2115, beats_loss=0.02897, ecapa_loss=0.001835, whisper_loss=0.1641, over 2961987.31 frames. ], batch size: 72, lr: 3.60e-02, grad_scale: 8.0 2024-08-09 12:46:19,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3000.0, ans=6.5 2024-08-09 12:46:23,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=45.67 vs. limit=9.75 2024-08-09 12:46:24,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=41.09 vs. limit=8.625 2024-08-09 12:46:33,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3100.0, ans=0.030249999999999985 2024-08-09 12:46:33,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=33.42 vs. limit=8.6625 2024-08-09 12:46:42,030 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 12:46:51,179 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 12:46:57,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.85 vs. limit=9.975 2024-08-09 12:46:58,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=13.24 vs. limit=5.825 2024-08-09 12:46:59,485 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 12:47:07,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=43.32 vs. limit=8.7375 2024-08-09 12:47:09,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3300.0, ans=0.07625 2024-08-09 12:47:24,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3400.0, ans=0.340625 2024-08-09 12:47:28,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 350, loss[loss=0.1817, beats_loss=0.01863, ecapa_loss=0.001601, whisper_loss=0.1471, over 20133.00 frames. ], tot_loss[loss=0.2039, beats_loss=0.02604, ecapa_loss=0.001795, whisper_loss=0.1599, over 3157194.10 frames. ], batch size: 80, lr: 3.83e-02, grad_scale: 8.0 2024-08-09 12:47:35,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=26.17 vs. limit=8.8125 2024-08-09 12:47:38,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.50 vs. limit=5.875 2024-08-09 12:47:43,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3600.0, ans=0.06499999999999997 2024-08-09 12:47:43,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=42.18 vs. limit=6.8 2024-08-09 12:47:55,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=5.4399999999999995 2024-08-09 12:48:18,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3800.0, ans=0.321875 2024-08-09 12:48:29,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3900.0, ans=0.05374999999999999 2024-08-09 12:48:40,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=32.25 vs. limit=8.9625 2024-08-09 12:48:42,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=96.19 vs. limit=8.9625 2024-08-09 12:48:46,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.862e+01 3.339e+01 4.177e+01 8.866e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-09 12:48:46,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 400, loss[loss=0.1649, beats_loss=0.01946, ecapa_loss=0.001732, whisper_loss=0.1281, over 21556.00 frames. ], tot_loss[loss=0.1992, beats_loss=0.02374, ecapa_loss=0.00176, whisper_loss=0.1579, over 3308191.91 frames. ], batch size: 90, lr: 4.05e-02, grad_scale: 16.0 2024-08-09 12:48:48,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=5.6 2024-08-09 12:48:48,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=52.40 vs. limit=9.0 2024-08-09 12:49:01,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=68.77 vs. limit=9.0375 2024-08-09 12:49:04,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=9.0375 2024-08-09 12:49:05,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=9.0375 2024-08-09 12:49:06,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4100.0, ans=0.209 2024-08-09 12:49:13,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.28 vs. limit=9.0375 2024-08-09 12:49:15,969 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 12:49:16,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4200.0, ans=0.303125 2024-08-09 12:49:17,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=9.075 2024-08-09 12:49:17,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=59.22 vs. limit=9.075 2024-08-09 12:49:20,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=17.79 vs. limit=7.1 2024-08-09 12:49:23,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=10.65 2024-08-09 12:49:25,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=5.68 2024-08-09 12:49:26,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.13 vs. limit=7.1 2024-08-09 12:49:29,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4200.0, ans=9.075 2024-08-09 12:49:29,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=28.74 vs. limit=10.65 2024-08-09 12:49:52,285 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 18 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 12:49:56,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4400.0, ans=9.15 2024-08-09 12:50:02,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=36.47 vs. limit=10.875 2024-08-09 12:50:03,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 450, loss[loss=0.1743, beats_loss=0.01775, ecapa_loss=0.001557, whisper_loss=0.1409, over 18919.00 frames. ], tot_loss[loss=0.1925, beats_loss=0.02216, ecapa_loss=0.001718, whisper_loss=0.1531, over 3398928.80 frames. ], batch size: 74, lr: 4.28e-02, grad_scale: 16.0 2024-08-09 12:50:08,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=44.53 vs. limit=10.875 2024-08-09 12:50:10,735 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 12:50:13,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=9.1875 2024-08-09 12:50:24,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=24.08 vs. limit=9.225 2024-08-09 12:50:25,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4600.0, ans=0.254 2024-08-09 12:50:25,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=9.225 2024-08-09 12:50:31,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4600.0, ans=9.225 2024-08-09 12:50:34,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.85 vs. limit=9.2625 2024-08-09 12:50:46,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4700.0, ans=0.2796875 2024-08-09 12:50:46,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=9.2625 2024-08-09 12:50:46,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.53 vs. limit=9.2625 2024-08-09 12:50:48,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=30.74 vs. limit=7.4 2024-08-09 12:50:56,190 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 12:51:01,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=9.3 2024-08-09 12:51:11,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.02 vs. limit=5.0 2024-08-09 12:51:13,420 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 12:51:15,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4900.0, ans=0.2703125 2024-08-09 12:51:18,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.539e+01 2.551e+01 3.113e+01 4.254e+01 7.113e+01, threshold=6.225e+01, percent-clipped=1.0 2024-08-09 12:51:18,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 500, loss[loss=0.1677, beats_loss=0.01721, ecapa_loss=0.001434, whisper_loss=0.1361, over 23128.00 frames. ], tot_loss[loss=0.1872, beats_loss=0.0208, ecapa_loss=0.001677, whisper_loss=0.1496, over 3517284.86 frames. ], batch size: 92, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:51:23,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=3.75 2024-08-09 12:51:23,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=5000.0, ans=11.25 2024-08-09 12:51:24,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=20.39 vs. limit=7.5 2024-08-09 12:51:31,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=21.75 vs. limit=7.5 2024-08-09 12:51:32,336 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 12:51:46,114 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-09 12:51:48,777 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-09 12:51:51,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=25.98 vs. limit=9.45 2024-08-09 12:51:54,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=11.4 2024-08-09 12:52:01,392 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 12:52:04,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=5300.0, ans=0.066875 2024-08-09 12:52:06,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=35.84 vs. limit=11.475 2024-08-09 12:52:16,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=5300.0, ans=0.197 2024-08-09 12:52:20,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=6.16 2024-08-09 12:52:23,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=22.78 vs. limit=9.525 2024-08-09 12:52:28,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=9.525 2024-08-09 12:52:35,454 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 550, loss[loss=0.166, beats_loss=0.01591, ecapa_loss=0.001423, whisper_loss=0.1359, over 18514.00 frames. ], tot_loss[loss=0.1839, beats_loss=0.01973, ecapa_loss=0.001638, whisper_loss=0.1478, over 3570382.46 frames. ], batch size: 70, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:52:36,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.46 vs. limit=11.625 2024-08-09 12:52:44,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=23.66 vs. limit=9.5625 2024-08-09 12:53:01,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=9.6 2024-08-09 12:53:02,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5600.0, ans=0.2375 2024-08-09 12:53:13,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5700.0, ans=0.04291666666666667 2024-08-09 12:53:19,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=5700.0, ans=0.03218750000000001 2024-08-09 12:53:20,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5800.0, ans=0.6970000000000001 2024-08-09 12:53:28,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=9.675 2024-08-09 12:53:32,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=39.05 vs. limit=11.85 2024-08-09 12:53:34,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=11.85 2024-08-09 12:53:42,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5900.0, ans=0.2885 2024-08-09 12:53:47,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5900.0, ans=0.2234375 2024-08-09 12:53:51,129 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 12:53:52,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=9.75 2024-08-09 12:53:52,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+01 2.262e+01 2.880e+01 3.640e+01 5.434e+01, threshold=5.761e+01, percent-clipped=0.0 2024-08-09 12:53:52,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 600, loss[loss=0.152, beats_loss=0.01764, ecapa_loss=0.001668, whisper_loss=0.1177, over 14863.00 frames. ], tot_loss[loss=0.1813, beats_loss=0.01895, ecapa_loss=0.001598, whisper_loss=0.1463, over 3626970.42 frames. ], batch size: 63, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:53:55,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=9.75 2024-08-09 12:53:55,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=58.90 vs. limit=9.75 2024-08-09 12:54:02,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=6000.0, ans=0.21875 2024-08-09 12:54:11,509 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 12:54:15,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=8.05 2024-08-09 12:54:26,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=6.48 2024-08-09 12:54:27,479 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-09 12:54:33,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=6200.0, ans=0.20937499999999998 2024-08-09 12:54:41,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=32.78 vs. limit=12.225 2024-08-09 12:54:47,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.37 vs. limit=8.15 2024-08-09 12:54:47,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.85 vs. limit=6.575 2024-08-09 12:54:58,061 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 12:55:01,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=33.40 vs. limit=9.9 2024-08-09 12:55:08,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=9.9 2024-08-09 12:55:10,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 650, loss[loss=0.1507, beats_loss=0.01658, ecapa_loss=0.001345, whisper_loss=0.1207, over 16998.00 frames. ], tot_loss[loss=0.1787, beats_loss=0.01828, ecapa_loss=0.001557, whisper_loss=0.1449, over 3671505.41 frames. ], batch size: 68, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:55:17,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.74 vs. limit=12.375 2024-08-09 12:55:18,026 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-09 12:55:24,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=9.975 2024-08-09 12:55:25,278 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 12:55:32,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=9.975 2024-08-09 12:55:32,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=9.975 2024-08-09 12:55:39,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=43.87 vs. limit=10.0125 2024-08-09 12:55:42,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=6700.0, ans=0.18593749999999998 2024-08-09 12:55:54,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=6800.0, ans=0.02 2024-08-09 12:55:54,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=6800.0, ans=0.0575 2024-08-09 12:55:54,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=10.05 2024-08-09 12:56:01,091 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 12:56:21,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=6900.0, ans=0.1765625 2024-08-09 12:56:24,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.370e+01 2.323e+01 2.699e+01 3.837e+01 7.112e+01, threshold=5.398e+01, percent-clipped=6.0 2024-08-09 12:56:25,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 700, loss[loss=0.1463, beats_loss=0.0181, ecapa_loss=0.001228, whisper_loss=0.1159, over 19301.00 frames. ], tot_loss[loss=0.1766, beats_loss=0.0177, ecapa_loss=0.001518, whisper_loss=0.1437, over 3691400.26 frames. ], batch size: 77, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:56:25,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=32.52 vs. limit=12.75 2024-08-09 12:56:43,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7100.0, ans=0.1671875 2024-08-09 12:56:46,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=10.1625 2024-08-09 12:56:50,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.24 vs. limit=12.825 2024-08-09 12:56:55,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=7200.0, ans=0.009304347826086957 2024-08-09 12:57:01,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=6.88 2024-08-09 12:57:03,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=10.2 2024-08-09 12:57:06,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=7200.0, ans=0.16249999999999998 2024-08-09 12:57:06,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=10.2 2024-08-09 12:57:19,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=6.92 2024-08-09 12:57:22,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=7300.0, ans=0.6445000000000001 2024-08-09 12:57:36,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7400.0, ans=0.153125 2024-08-09 12:57:40,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 750, loss[loss=0.1671, beats_loss=0.01809, ecapa_loss=0.001035, whisper_loss=0.1387, over 16374.00 frames. ], tot_loss[loss=0.1752, beats_loss=0.01721, ecapa_loss=0.001464, whisper_loss=0.1433, over 3741765.83 frames. ], batch size: 62, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:57:42,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.97 vs. limit=13.125 2024-08-09 12:57:43,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=13.125 2024-08-09 12:57:46,797 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 12:57:57,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=7600.0, ans=0.14375 2024-08-09 12:58:02,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=7600.0, ans=0.009217391304347827 2024-08-09 12:58:02,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=7600.0, ans=0.035 2024-08-09 12:58:03,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=7600.0, ans=0.035 2024-08-09 12:58:03,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7600.0, ans=0.224 2024-08-09 12:58:16,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=7700.0, ans=0.13906249999999998 2024-08-09 12:58:19,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=7700.0, ans=0.009195652173913044 2024-08-09 12:58:42,252 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-09 12:58:44,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=7900.0, ans=0.17099999999999999 2024-08-09 12:58:57,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.305e+01 2.802e+01 3.610e+01 6.792e+01, threshold=5.604e+01, percent-clipped=3.0 2024-08-09 12:58:57,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 800, loss[loss=0.1444, beats_loss=0.01905, ecapa_loss=0.001083, whisper_loss=0.1146, over 22865.00 frames. ], tot_loss[loss=0.1733, beats_loss=0.01682, ecapa_loss=0.001418, whisper_loss=0.1423, over 3762494.77 frames. ], batch size: 93, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 12:59:02,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=8000.0, ans=0.125 2024-08-09 12:59:04,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=8000.0, ans=0.0 2024-08-09 12:59:08,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=13.5 2024-08-09 12:59:19,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=27.15 vs. limit=10.5375 2024-08-09 12:59:20,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=8100.0, ans=0.125 2024-08-09 12:59:22,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=10.5375 2024-08-09 12:59:24,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=7.025 2024-08-09 12:59:30,701 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 12:59:39,747 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.219e+00 2024-08-09 12:59:46,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=7.075 2024-08-09 12:59:54,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8300.0, ans=0.217 2024-08-09 13:00:03,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=13.8 2024-08-09 13:00:13,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 850, loss[loss=0.1638, beats_loss=0.01563, ecapa_loss=0.001091, whisper_loss=0.1372, over 19532.00 frames. ], tot_loss[loss=0.1694, beats_loss=0.01658, ecapa_loss=0.001363, whisper_loss=0.1392, over 3791898.45 frames. ], batch size: 77, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 13:00:14,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=8500.0, ans=0.125 2024-08-09 13:00:19,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=8500.0, ans=0.6025 2024-08-09 13:00:40,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.83 vs. limit=9.3 2024-08-09 13:00:46,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=10.7625 2024-08-09 13:00:57,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=8800.0, ans=0.008956521739130436 2024-08-09 13:00:58,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.20 vs. limit=14.1 2024-08-09 13:01:00,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=7.52 2024-08-09 13:01:03,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=10.8 2024-08-09 13:01:09,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212 2024-08-09 13:01:10,495 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-09 13:01:13,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=8900.0, ans=0.125 2024-08-09 13:01:26,635 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.500e+01 2.129e+01 2.561e+01 3.167e+01 6.018e+01, threshold=5.121e+01, percent-clipped=3.0 2024-08-09 13:01:26,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 900, loss[loss=0.1602, beats_loss=0.01182, ecapa_loss=0.001459, whisper_loss=0.1338, over 14471.00 frames. ], tot_loss[loss=0.1664, beats_loss=0.0163, ecapa_loss=0.001317, whisper_loss=0.1369, over 3769746.96 frames. ], batch size: 60, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:01:27,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9000.0, ans=0.21000000000000002 2024-08-09 13:01:35,180 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 13:01:36,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=9000.0, ans=0.125 2024-08-09 13:01:40,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=10.9125 2024-08-09 13:01:43,976 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.497e+00 2024-08-09 13:01:53,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9200.0, ans=0.125 2024-08-09 13:02:00,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=14.4 2024-08-09 13:02:10,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9300.0, ans=0.20700000000000002 2024-08-09 13:02:14,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=9300.0, ans=0.125 2024-08-09 13:02:24,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=9400.0, ans=0.125 2024-08-09 13:02:25,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=9400.0, ans=0.008826086956521739 2024-08-09 13:02:26,901 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 13:02:33,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.92 vs. limit=11.025 2024-08-09 13:02:37,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 950, loss[loss=0.1676, beats_loss=0.0136, ecapa_loss=0.001054, whisper_loss=0.1434, over 19343.00 frames. ], tot_loss[loss=0.164, beats_loss=0.0161, ecapa_loss=0.001264, whisper_loss=0.1353, over 3782373.46 frames. ], batch size: 73, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:02:37,928 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-09 13:02:39,415 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 13:03:04,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=9600.0, ans=0.125 2024-08-09 13:03:11,461 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 13:03:41,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=9900.0, ans=0.125 2024-08-09 13:03:50,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.609e+01 2.154e+01 2.525e+01 3.011e+01 6.635e+01, threshold=5.049e+01, percent-clipped=1.0 2024-08-09 13:03:50,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1000, loss[loss=0.1452, beats_loss=0.01625, ecapa_loss=0.001243, whisper_loss=0.1165, over 19866.00 frames. ], tot_loss[loss=0.1616, beats_loss=0.01585, ecapa_loss=0.001221, whisper_loss=0.1335, over 3792777.10 frames. ], batch size: 88, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:04:06,217 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 13:04:08,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=11.2875 2024-08-09 13:04:09,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=10100.0, ans=0.008673913043478261 2024-08-09 13:04:13,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=15.075 2024-08-09 13:04:15,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.33 vs. limit=10.05 2024-08-09 13:04:43,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=10300.0, ans=0.5395000000000001 2024-08-09 13:04:43,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=11.3625 2024-08-09 13:04:48,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.44 vs. limit=11.4 2024-08-09 13:04:55,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=10400.0, ans=0.125 2024-08-09 13:05:04,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1050, loss[loss=0.1582, beats_loss=0.01362, ecapa_loss=0.001077, whisper_loss=0.1338, over 18318.00 frames. ], tot_loss[loss=0.1598, beats_loss=0.01559, ecapa_loss=0.001184, whisper_loss=0.1324, over 3797335.86 frames. ], batch size: 70, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:05:13,269 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 13:05:18,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=23.17 vs. limit=11.475 2024-08-09 13:05:27,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=7.65 2024-08-09 13:05:45,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=11.5125 2024-08-09 13:05:48,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=10800.0, ans=0.02166666666666667 2024-08-09 13:05:54,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10800.0, ans=0.0 2024-08-09 13:06:10,183 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-09 13:06:13,430 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 13:06:16,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=10900.0, ans=0.5185000000000001 2024-08-09 13:06:18,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.283e+01 2.878e+01 3.739e+01 7.694e+01, threshold=5.756e+01, percent-clipped=7.0 2024-08-09 13:06:18,872 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1100, loss[loss=0.1568, beats_loss=0.0148, ecapa_loss=0.001064, whisper_loss=0.1314, over 20681.00 frames. ], tot_loss[loss=0.1578, beats_loss=0.01542, ecapa_loss=0.001142, whisper_loss=0.131, over 3820900.86 frames. ], batch size: 84, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:06:38,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=11.6625 2024-08-09 13:06:45,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=11.6625 2024-08-09 13:06:47,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=11200.0, ans=0.125 2024-08-09 13:06:53,236 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 13:06:58,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.92 vs. limit=15.9 2024-08-09 13:06:59,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11200.0, ans=0.125 2024-08-09 13:07:02,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=11300.0, ans=0.00841304347826087 2024-08-09 13:07:04,833 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 13:07:10,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=11300.0, ans=0.125 2024-08-09 13:07:11,532 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-09 13:07:22,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=11400.0, ans=0.01916666666666667 2024-08-09 13:07:31,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1150, loss[loss=0.1714, beats_loss=0.01251, ecapa_loss=0.0008261, whisper_loss=0.1506, over 18189.00 frames. ], tot_loss[loss=0.156, beats_loss=0.01523, ecapa_loss=0.001102, whisper_loss=0.1297, over 3794254.35 frames. ], batch size: 65, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:07:38,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.07 vs. limit=16.125 2024-08-09 13:07:39,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=11500.0, ans=0.05 2024-08-09 13:07:45,575 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 13:07:49,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=11600.0, ans=0.125 2024-08-09 13:07:53,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11600.0, ans=0.184 2024-08-09 13:07:59,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=11.85 2024-08-09 13:07:59,834 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 13:08:13,612 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 13:08:38,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=11900.0, ans=0.125 2024-08-09 13:08:42,484 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 13:08:45,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.329e+01 2.685e+01 3.204e+01 5.571e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-09 13:08:45,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1200, loss[loss=0.1567, beats_loss=0.01214, ecapa_loss=0.001127, whisper_loss=0.1333, over 14365.00 frames. ], tot_loss[loss=0.1546, beats_loss=0.01514, ecapa_loss=0.001067, whisper_loss=0.1288, over 3811329.41 frames. ], batch size: 59, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:08:55,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=12.0 2024-08-09 13:08:57,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=12.0 2024-08-09 13:09:06,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.39 vs. limit=8.025 2024-08-09 13:09:40,415 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 13:09:40,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=12300.0, ans=0.125 2024-08-09 13:09:40,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=12300.0, ans=11.15 2024-08-09 13:09:53,241 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-09 13:09:58,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1250, loss[loss=0.1425, beats_loss=0.01283, ecapa_loss=0.001107, whisper_loss=0.1186, over 17872.00 frames. ], tot_loss[loss=0.1524, beats_loss=0.01501, ecapa_loss=0.001031, whisper_loss=0.1271, over 3808133.36 frames. ], batch size: 73, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:10:27,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=17.025 2024-08-09 13:10:37,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=12700.0, ans=0.4555 2024-08-09 13:10:54,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=12800.0, ans=0.00808695652173913 2024-08-09 13:10:56,989 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 13:11:12,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.459e+01 3.175e+01 4.087e+01 8.300e+01, threshold=6.351e+01, percent-clipped=6.0 2024-08-09 13:11:13,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1300, loss[loss=0.144, beats_loss=0.0132, ecapa_loss=0.00081, whisper_loss=0.1227, over 19596.00 frames. ], tot_loss[loss=0.1508, beats_loss=0.015, ecapa_loss=0.001001, whisper_loss=0.1258, over 3807197.53 frames. ], batch size: 72, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:11:13,255 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-09 13:11:16,218 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-09 13:11:26,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=13100.0, ans=0.125 2024-08-09 13:11:28,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=17.325 2024-08-09 13:11:36,910 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 13:11:37,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=13100.0, ans=0.125 2024-08-09 13:12:03,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=13300.0, ans=0.125 2024-08-09 13:12:03,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=13300.0, ans=0.3995 2024-08-09 13:12:07,745 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 13:12:21,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=13400.0, ans=0.43100000000000005 2024-08-09 13:12:25,567 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 13:12:27,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1350, loss[loss=0.1426, beats_loss=0.01478, ecapa_loss=0.0008855, whisper_loss=0.1189, over 17459.00 frames. ], tot_loss[loss=0.149, beats_loss=0.01488, ecapa_loss=0.0009707, whisper_loss=0.1244, over 3808032.26 frames. ], batch size: 72, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:12:27,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.66 vs. limit=17.625 2024-08-09 13:12:27,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=8.375 2024-08-09 13:12:29,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=13500.0, ans=0.010416666666666671 2024-08-09 13:12:41,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=13600.0, ans=0.16399999999999998 2024-08-09 13:12:48,016 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 13:12:48,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13600.0, ans=0.16399999999999998 2024-08-09 13:13:21,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=13800.0, ans=0.125 2024-08-09 13:13:23,878 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 13:13:31,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=12.7125 2024-08-09 13:13:32,496 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 13:13:40,990 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.453e+01 2.894e+01 3.668e+01 7.407e+01, threshold=5.787e+01, percent-clipped=1.0 2024-08-09 13:13:41,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1400, loss[loss=0.1233, beats_loss=0.01227, ecapa_loss=0.0009012, whisper_loss=0.102, over 18061.00 frames. ], tot_loss[loss=0.1468, beats_loss=0.01486, ecapa_loss=0.0009416, whisper_loss=0.1226, over 3823733.16 frames. ], batch size: 73, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:13:47,447 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 13:13:53,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=14000.0, ans=0.125 2024-08-09 13:13:54,588 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 13:14:13,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=14200.0, ans=0.125 2024-08-09 13:14:15,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=12.825 2024-08-09 13:14:17,748 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 13:14:53,369 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 13:14:54,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.58 vs. limit=8.6 2024-08-09 13:14:55,628 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:14:56,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1450, loss[loss=0.1305, beats_loss=0.01608, ecapa_loss=0.0005671, whisper_loss=0.1087, over 19819.00 frames. ], tot_loss[loss=0.1463, beats_loss=0.01468, ecapa_loss=0.0009115, whisper_loss=0.1225, over 3829280.31 frames. ], batch size: 74, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:14:58,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=14500.0, ans=0.0062500000000000056 2024-08-09 13:15:24,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=9.8 2024-08-09 13:15:35,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=12.975 2024-08-09 13:15:56,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=14700.0, ans=0.007673913043478261 2024-08-09 13:16:06,032 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 13:16:16,759 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 15 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 13:16:23,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.09 vs. limit=5.234999999999999 2024-08-09 13:16:31,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.402e+01 3.110e+01 4.073e+01 8.821e+01, threshold=6.219e+01, percent-clipped=9.0 2024-08-09 13:16:31,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1500, loss[loss=0.1381, beats_loss=0.01378, ecapa_loss=0.000697, whisper_loss=0.1174, over 18435.00 frames. ], tot_loss[loss=0.1453, beats_loss=0.01467, ecapa_loss=0.0008839, whisper_loss=0.1218, over 3873358.79 frames. ], batch size: 70, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:16:38,577 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 13:16:50,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=15100.0, ans=0.125 2024-08-09 13:17:12,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=15200.0, ans=0.125 2024-08-09 13:17:19,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=15300.0, ans=0.14700000000000002 2024-08-09 13:17:37,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=15400.0, ans=0.125 2024-08-09 13:17:43,894 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-09 13:17:52,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1550, loss[loss=0.1694, beats_loss=0.007727, ecapa_loss=0.0009535, whisper_loss=0.1521, over 14901.00 frames. ], tot_loss[loss=0.1442, beats_loss=0.01464, ecapa_loss=0.000867, whisper_loss=0.1209, over 3858373.65 frames. ], batch size: 57, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:18:31,652 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 13:19:06,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.16 vs. limit=19.425 2024-08-09 13:19:12,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.449e+01 2.841e+01 3.798e+01 6.790e+01, threshold=5.683e+01, percent-clipped=3.0 2024-08-09 13:19:12,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1600, loss[loss=0.1481, beats_loss=0.01386, ecapa_loss=0.0008048, whisper_loss=0.1262, over 21952.00 frames. ], tot_loss[loss=0.1432, beats_loss=0.01453, ecapa_loss=0.0008493, whisper_loss=0.1202, over 3846803.63 frames. ], batch size: 86, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:19:15,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16000.0, ans=0.14 2024-08-09 13:19:26,061 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 13:19:30,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=16100.0, ans=0.125 2024-08-09 13:19:47,302 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-09 13:20:12,490 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 13:20:31,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=13.65 2024-08-09 13:20:33,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1650, loss[loss=0.1225, beats_loss=0.01477, ecapa_loss=0.0007539, whisper_loss=0.1002, over 18313.00 frames. ], tot_loss[loss=0.1421, beats_loss=0.0145, ecapa_loss=0.000828, whisper_loss=0.1193, over 3839920.07 frames. ], batch size: 75, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:20:51,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=16600.0, ans=0.31900000000000006 2024-08-09 13:21:07,639 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 13:21:10,183 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.572e-02 2024-08-09 13:21:15,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16700.0, ans=0.125 2024-08-09 13:21:44,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=16900.0, ans=0.0 2024-08-09 13:21:48,651 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-09 13:21:54,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.579e+01 3.058e+01 4.131e+01 8.941e+01, threshold=6.115e+01, percent-clipped=7.0 2024-08-09 13:21:54,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1700, loss[loss=0.1337, beats_loss=0.01151, ecapa_loss=0.0008162, whisper_loss=0.1141, over 15846.00 frames. ], tot_loss[loss=0.1422, beats_loss=0.01441, ecapa_loss=0.0008107, whisper_loss=0.1197, over 3858116.22 frames. ], batch size: 62, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:22:03,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=17000.0, ans=0.125 2024-08-09 13:22:03,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=17000.0, ans=0.30500000000000005 2024-08-09 13:22:13,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=17100.0, ans=0.125 2024-08-09 13:22:24,627 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 13:22:25,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=17200.0, ans=0.125 2024-08-09 13:22:25,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=13.95 2024-08-09 13:22:32,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=17200.0, ans=0.128 2024-08-09 13:22:40,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=17200.0, ans=0.125 2024-08-09 13:22:43,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=17300.0, ans=0.125 2024-08-09 13:23:03,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=20.55 2024-08-09 13:23:05,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=17400.0, ans=0.125 2024-08-09 13:23:13,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1750, loss[loss=0.1166, beats_loss=0.01446, ecapa_loss=0.0008615, whisper_loss=0.09349, over 18847.00 frames. ], tot_loss[loss=0.1414, beats_loss=0.0144, ecapa_loss=0.000793, whisper_loss=0.119, over 3856302.30 frames. ], batch size: 78, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:23:25,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.89 vs. limit=13.75 2024-08-09 13:23:27,930 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-09 13:23:31,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=17600.0, ans=0.125 2024-08-09 13:23:45,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=17700.0, ans=0.0 2024-08-09 13:23:46,363 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 13:24:08,231 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-09 13:24:27,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.728e+01 3.350e+01 4.234e+01 7.677e+01, threshold=6.699e+01, percent-clipped=2.0 2024-08-09 13:24:27,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1800, loss[loss=0.1447, beats_loss=0.01375, ecapa_loss=0.0006625, whisper_loss=0.1244, over 18606.00 frames. ], tot_loss[loss=0.141, beats_loss=0.01423, ecapa_loss=0.0007756, whisper_loss=0.119, over 3829873.94 frames. ], batch size: 71, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:24:28,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=18000.0, ans=0.125 2024-08-09 13:24:30,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=18000.0, ans=0.0 2024-08-09 13:24:30,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=21.0 2024-08-09 13:24:35,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18000.0, ans=0.12000000000000002 2024-08-09 13:24:39,652 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-09 13:24:41,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=18100.0, ans=0.125 2024-08-09 13:25:04,959 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 29 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 13:25:05,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=14.325 2024-08-09 13:25:10,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=18300.0, ans=0.125 2024-08-09 13:25:12,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=18300.0, ans=0.04949747468305833 2024-08-09 13:25:14,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=14.3625 2024-08-09 13:25:43,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1850, loss[loss=0.1316, beats_loss=0.01505, ecapa_loss=0.0008954, whisper_loss=0.1076, over 22136.00 frames. ], tot_loss[loss=0.1399, beats_loss=0.01423, ecapa_loss=0.0007633, whisper_loss=0.118, over 3850011.43 frames. ], batch size: 92, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:25:46,978 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 13:26:03,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=18600.0, ans=0.0 2024-08-09 13:26:21,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.07 vs. limit=21.525 2024-08-09 13:26:34,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18800.0, ans=0.11200000000000002 2024-08-09 13:26:37,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.67 vs. limit=9.7 2024-08-09 13:26:49,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=18900.0, ans=0.125 2024-08-09 13:26:53,820 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 13:27:03,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.592e+01 3.002e+01 4.008e+01 1.371e+02, threshold=6.005e+01, percent-clipped=3.0 2024-08-09 13:27:03,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1900, loss[loss=0.1202, beats_loss=0.01322, ecapa_loss=0.0007387, whisper_loss=0.09955, over 19502.00 frames. ], tot_loss[loss=0.1393, beats_loss=0.01424, ecapa_loss=0.0007606, whisper_loss=0.1175, over 3817931.64 frames. ], batch size: 76, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:27:10,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=19000.0, ans=0.0 2024-08-09 13:27:10,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=19000.0, ans=0.125 2024-08-09 13:27:19,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=19100.0, ans=0.4865 2024-08-09 13:27:29,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=21.825 2024-08-09 13:27:38,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=14.7 2024-08-09 13:27:42,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=19200.0, ans=0.125 2024-08-09 13:27:51,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19300.0, ans=0.10700000000000001 2024-08-09 13:27:53,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=19300.0, ans=0.0 2024-08-09 13:28:00,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=19300.0, ans=0.125 2024-08-09 13:28:05,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=14.775 2024-08-09 13:28:20,352 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 1950, loss[loss=0.1162, beats_loss=0.01687, ecapa_loss=0.0006798, whisper_loss=0.09253, over 17254.00 frames. ], tot_loss[loss=0.1388, beats_loss=0.01428, ecapa_loss=0.0007565, whisper_loss=0.117, over 3822852.40 frames. ], batch size: 70, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:28:33,564 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 13:28:33,971 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.171e-01 2024-08-09 13:28:36,527 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 13:29:11,139 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-09 13:29:35,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.629e+01 3.262e+01 3.981e+01 7.661e+01, threshold=6.525e+01, percent-clipped=2.0 2024-08-09 13:29:35,119 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2000, loss[loss=0.1421, beats_loss=0.00945, ecapa_loss=0.001019, whisper_loss=0.1225, over 18153.00 frames. ], tot_loss[loss=0.1383, beats_loss=0.01418, ecapa_loss=0.0007538, whisper_loss=0.1166, over 3813316.83 frames. ], batch size: 77, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:29:39,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=20000.0, ans=0.2 2024-08-09 13:29:45,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=20000.0, ans=0.125 2024-08-09 13:29:46,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=20000.0, ans=0.125 2024-08-09 13:29:55,242 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-09 13:29:55,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=20100.0, ans=0.0 2024-08-09 13:30:02,159 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 13:30:31,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2024-08-09 13:30:35,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20300.0, ans=0.1 2024-08-09 13:30:38,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=20400.0, ans=0.0 2024-08-09 13:30:43,898 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-09 13:30:54,986 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2050, loss[loss=0.1196, beats_loss=0.01276, ecapa_loss=0.0007169, whisper_loss=0.09965, over 18550.00 frames. ], tot_loss[loss=0.1375, beats_loss=0.01412, ecapa_loss=0.0007484, whisper_loss=0.1159, over 3811587.84 frames. ], batch size: 68, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:31:01,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.78 vs. limit=22.5 2024-08-09 13:31:02,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=12.0 2024-08-09 13:31:11,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-09 13:31:17,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-09 13:31:20,407 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-09 13:31:40,996 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-09 13:31:54,607 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.757e-02 2024-08-09 13:32:05,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20900.0, ans=0.1 2024-08-09 13:32:08,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.818e+01 3.204e+01 4.044e+01 7.345e+01, threshold=6.407e+01, percent-clipped=1.0 2024-08-09 13:32:08,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2100, loss[loss=0.1424, beats_loss=0.01397, ecapa_loss=0.0006681, whisper_loss=0.1217, over 23237.00 frames. ], tot_loss[loss=0.136, beats_loss=0.01424, ecapa_loss=0.0007413, whisper_loss=0.1143, over 3771651.10 frames. ], batch size: 91, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:32:19,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=21000.0, ans=0.125 2024-08-09 13:32:34,533 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:32:38,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2024-08-09 13:32:42,793 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 13:32:49,464 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 24 from Vox, 50 fro AS 2024-08-09 13:32:51,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-09 13:32:53,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.10 vs. limit=10.0 2024-08-09 13:33:00,231 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-09 13:33:11,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.11 vs. limit=10.0 2024-08-09 13:33:21,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=21400.0, ans=0.04949747468305833 2024-08-09 13:33:21,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=21400.0, ans=0.125 2024-08-09 13:33:24,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=21500.0, ans=0.125 2024-08-09 13:33:25,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2150, loss[loss=0.1484, beats_loss=0.01411, ecapa_loss=0.0005961, whisper_loss=0.1283, over 17704.00 frames. ], tot_loss[loss=0.1352, beats_loss=0.01419, ecapa_loss=0.0007328, whisper_loss=0.1137, over 3757011.77 frames. ], batch size: 66, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:33:28,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=21500.0, ans=0.0 2024-08-09 13:33:39,959 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 13:33:41,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=21600.0, ans=0.125 2024-08-09 13:33:47,089 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 13:33:52,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=21600.0, ans=10.0 2024-08-09 13:34:03,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=21700.0, ans=0.2 2024-08-09 13:34:10,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21800.0, ans=0.0 2024-08-09 13:34:30,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.11 vs. limit=22.5 2024-08-09 13:34:39,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=21900.0, ans=0.2 2024-08-09 13:34:42,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.673e+01 3.209e+01 4.237e+01 7.311e+01, threshold=6.417e+01, percent-clipped=1.0 2024-08-09 13:34:42,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2200, loss[loss=0.1383, beats_loss=0.01293, ecapa_loss=0.0007215, whisper_loss=0.1182, over 23424.00 frames. ], tot_loss[loss=0.1356, beats_loss=0.01407, ecapa_loss=0.000727, whisper_loss=0.1143, over 3752730.44 frames. ], batch size: 94, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:35:21,582 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 13:35:24,117 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 34 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 13:35:25,417 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-09 13:35:28,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=22300.0, ans=0.0 2024-08-09 13:35:54,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2024-08-09 13:35:57,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=22400.0, ans=0.2 2024-08-09 13:36:01,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2250, loss[loss=0.1844, beats_loss=0.008006, ecapa_loss=0.0006263, whisper_loss=0.1701, over 15471.00 frames. ], tot_loss[loss=0.1367, beats_loss=0.01399, ecapa_loss=0.0007215, whisper_loss=0.1155, over 3779773.91 frames. ], batch size: 54, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:36:03,628 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 13:36:04,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=22500.0, ans=0.0 2024-08-09 13:36:31,162 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:36:53,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=22700.0, ans=0.0059347826086956525 2024-08-09 13:36:56,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=22700.0, ans=0.2 2024-08-09 13:37:07,472 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-09 13:37:18,389 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 39 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 13:37:19,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22800.0, ans=0.1 2024-08-09 13:37:27,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=22800.0, ans=0.1 2024-08-09 13:37:43,217 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 13:37:43,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=22900.0, ans=0.005891304347826087 2024-08-09 13:37:45,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=23000.0, ans=0.07 2024-08-09 13:37:45,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.951e+01 3.575e+01 4.087e+01 9.473e+01, threshold=7.150e+01, percent-clipped=2.0 2024-08-09 13:37:45,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2300, loss[loss=0.127, beats_loss=0.01277, ecapa_loss=0.0009139, whisper_loss=0.1051, over 15507.00 frames. ], tot_loss[loss=0.1372, beats_loss=0.01394, ecapa_loss=0.0007215, whisper_loss=0.116, over 3812658.51 frames. ], batch size: 70, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:37:48,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=23000.0, ans=0.125 2024-08-09 13:38:17,351 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-09 13:38:22,504 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 13:38:37,586 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 13:38:41,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-08-09 13:38:43,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=23300.0, ans=0.125 2024-08-09 13:38:44,755 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 13:38:55,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=23400.0, ans=0.1 2024-08-09 13:39:04,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2350, loss[loss=0.1404, beats_loss=0.01422, ecapa_loss=0.0006684, whisper_loss=0.1195, over 22717.00 frames. ], tot_loss[loss=0.1369, beats_loss=0.01384, ecapa_loss=0.0007127, whisper_loss=0.1159, over 3842920.95 frames. ], batch size: 90, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:39:30,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=23600.0, ans=0.125 2024-08-09 13:39:35,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=23700.0, ans=0.005717391304347826 2024-08-09 13:39:42,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=23700.0, ans=0.005717391304347826 2024-08-09 13:39:45,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=23700.0, ans=0.125 2024-08-09 13:39:55,514 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 30 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 13:39:56,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2024-08-09 13:40:03,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2024-08-09 13:40:05,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=23800.0, ans=0.2 2024-08-09 13:40:11,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=23900.0, ans=0.5 2024-08-09 13:40:15,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=23900.0, ans=0.125 2024-08-09 13:40:15,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=23900.0, ans=0.0 2024-08-09 13:40:23,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.811e+01 3.461e+01 4.504e+01 7.215e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-09 13:40:24,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2400, loss[loss=0.101, beats_loss=0.01486, ecapa_loss=0.0006752, whisper_loss=0.07935, over 14788.00 frames. ], tot_loss[loss=0.1375, beats_loss=0.01369, ecapa_loss=0.0007035, whisper_loss=0.1168, over 3873178.53 frames. ], batch size: 56, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:40:41,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-08-09 13:40:47,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-09 13:40:55,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2024-08-09 13:41:18,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=24300.0, ans=0.1 2024-08-09 13:41:23,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=24400.0, ans=0.125 2024-08-09 13:41:36,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=24400.0, ans=0.2 2024-08-09 13:41:39,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2450, loss[loss=0.1715, beats_loss=0.01285, ecapa_loss=0.0006588, whisper_loss=0.1521, over 23434.00 frames. ], tot_loss[loss=0.1365, beats_loss=0.01371, ecapa_loss=0.000693, whisper_loss=0.1159, over 3862097.06 frames. ], batch size: 90, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:41:41,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24500.0, ans=0.125 2024-08-09 13:41:49,616 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 13:41:51,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24500.0, ans=0.125 2024-08-09 13:42:22,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=24800.0, ans=0.125 2024-08-09 13:42:29,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=24800.0, ans=0.0 2024-08-09 13:42:31,047 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 13:42:38,087 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 13:42:42,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=24900.0, ans=0.125 2024-08-09 13:42:52,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.830e+01 3.469e+01 4.522e+01 1.002e+02, threshold=6.939e+01, percent-clipped=2.0 2024-08-09 13:42:52,639 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2500, loss[loss=0.1405, beats_loss=0.01475, ecapa_loss=0.0006037, whisper_loss=0.1197, over 21963.00 frames. ], tot_loss[loss=0.1359, beats_loss=0.01371, ecapa_loss=0.0006857, whisper_loss=0.1154, over 3876788.32 frames. ], batch size: 87, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:43:03,128 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 13:43:20,887 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 13:43:22,506 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 13:43:22,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25200.0, ans=0.1 2024-08-09 13:43:24,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.74 vs. limit=10.0 2024-08-09 13:43:30,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=25200.0, ans=0.2 2024-08-09 13:43:54,177 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 36 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 13:43:57,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25400.0, ans=0.1 2024-08-09 13:43:59,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=25400.0, ans=0.005347826086956522 2024-08-09 13:44:02,570 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 13:44:04,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=25400.0, ans=0.0 2024-08-09 13:44:08,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2550, loss[loss=0.1127, beats_loss=0.01283, ecapa_loss=0.0006281, whisper_loss=0.09356, over 17440.00 frames. ], tot_loss[loss=0.1358, beats_loss=0.01359, ecapa_loss=0.000682, whisper_loss=0.1154, over 3869566.78 frames. ], batch size: 67, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:44:10,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.03 vs. limit=22.5 2024-08-09 13:44:15,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.38 vs. limit=22.5 2024-08-09 13:44:21,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=25600.0, ans=0.0 2024-08-09 13:44:33,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=25600.0, ans=0.0 2024-08-09 13:44:34,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2024-08-09 13:44:52,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=25800.0, ans=0.0 2024-08-09 13:45:01,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.42 vs. limit=22.5 2024-08-09 13:45:09,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.23 vs. limit=22.5 2024-08-09 13:45:14,356 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 13:45:15,903 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 13:45:17,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=25900.0, ans=0.125 2024-08-09 13:45:21,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 3.019e+01 3.579e+01 4.793e+01 1.038e+02, threshold=7.158e+01, percent-clipped=5.0 2024-08-09 13:45:21,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2600, loss[loss=0.1524, beats_loss=0.01439, ecapa_loss=0.0005705, whisper_loss=0.1323, over 21960.00 frames. ], tot_loss[loss=0.1351, beats_loss=0.01363, ecapa_loss=0.0006733, whisper_loss=0.1147, over 3858024.57 frames. ], batch size: 86, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:45:23,066 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 13:45:55,468 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 32 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 13:46:18,738 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 13:46:21,294 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 13:46:24,773 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-09 13:46:26,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-08-09 13:46:29,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=26400.0, ans=0.125 2024-08-09 13:46:34,516 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2650, loss[loss=0.1468, beats_loss=0.01223, ecapa_loss=0.0005414, whisper_loss=0.1291, over 19101.00 frames. ], tot_loss[loss=0.1344, beats_loss=0.01362, ecapa_loss=0.0006636, whisper_loss=0.1142, over 3852152.29 frames. ], batch size: 70, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:46:49,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=26600.0, ans=0.2 2024-08-09 13:47:02,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=26700.0, ans=0.125 2024-08-09 13:47:08,024 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 13:47:15,634 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 13:47:20,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=26800.0, ans=0.2 2024-08-09 13:47:29,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=26800.0, ans=0.0 2024-08-09 13:47:47,401 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.863e+01 3.296e+01 3.949e+01 7.406e+01, threshold=6.593e+01, percent-clipped=2.0 2024-08-09 13:47:47,424 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2700, loss[loss=0.1385, beats_loss=0.01282, ecapa_loss=0.0005951, whisper_loss=0.1197, over 19381.00 frames. ], tot_loss[loss=0.1337, beats_loss=0.01377, ecapa_loss=0.0006572, whisper_loss=0.1133, over 3846515.38 frames. ], batch size: 75, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:47:56,540 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-09 13:47:58,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-09 13:48:01,209 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 13:48:02,677 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 13:48:10,783 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 13:48:42,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=27300.0, ans=0.2 2024-08-09 13:48:44,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=27300.0, ans=0.0 2024-08-09 13:48:45,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2024-08-09 13:48:46,484 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-09 13:48:47,864 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 13:48:55,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=27400.0, ans=0.125 2024-08-09 13:48:59,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-09 13:49:01,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2750, loss[loss=0.1374, beats_loss=0.0114, ecapa_loss=0.000708, whisper_loss=0.119, over 17783.00 frames. ], tot_loss[loss=0.1325, beats_loss=0.01384, ecapa_loss=0.0006448, whisper_loss=0.1122, over 3825518.67 frames. ], batch size: 70, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:49:28,858 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 13:49:57,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2024-08-09 13:49:58,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=27800.0, ans=0.2 2024-08-09 13:50:01,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=12.0 2024-08-09 13:50:15,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=27900.0, ans=0.0 2024-08-09 13:50:15,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=27900.0, ans=0.125 2024-08-09 13:50:18,693 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 13:50:19,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.874e+01 3.420e+01 4.195e+01 6.815e+01, threshold=6.839e+01, percent-clipped=2.0 2024-08-09 13:50:19,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2800, loss[loss=0.1211, beats_loss=0.01416, ecapa_loss=0.0006568, whisper_loss=0.1003, over 19072.00 frames. ], tot_loss[loss=0.1319, beats_loss=0.01392, ecapa_loss=0.0006428, whisper_loss=0.1115, over 3821731.66 frames. ], batch size: 78, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:50:41,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=28100.0, ans=0.125 2024-08-09 13:50:42,600 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 13:50:46,972 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 13:50:53,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=28200.0, ans=0.125 2024-08-09 13:50:55,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=28200.0, ans=15.0 2024-08-09 13:50:58,293 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 13:50:59,581 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 13:51:30,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28400.0, ans=0.1 2024-08-09 13:51:30,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=28400.0, ans=0.2 2024-08-09 13:51:38,505 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2850, loss[loss=0.09464, beats_loss=0.01583, ecapa_loss=0.0005055, whisper_loss=0.07376, over 18083.00 frames. ], tot_loss[loss=0.1318, beats_loss=0.01397, ecapa_loss=0.0006366, whisper_loss=0.1115, over 3854571.39 frames. ], batch size: 71, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:51:42,533 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-09 13:51:52,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=28500.0, ans=0.004673913043478261 2024-08-09 13:51:52,380 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.397e+00 2024-08-09 13:51:54,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.66 vs. limit=10.0 2024-08-09 13:52:06,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28600.0, ans=0.125 2024-08-09 13:52:06,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2024-08-09 13:52:08,035 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 13:52:15,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28700.0, ans=0.1 2024-08-09 13:52:17,952 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 13:52:24,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=12.0 2024-08-09 13:52:33,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=28800.0, ans=0.125 2024-08-09 13:52:42,697 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 13:52:50,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=28900.0, ans=0.125 2024-08-09 13:52:58,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=28900.0, ans=0.125 2024-08-09 13:53:00,765 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 3.002e+01 3.706e+01 4.572e+01 7.980e+01, threshold=7.411e+01, percent-clipped=5.0 2024-08-09 13:53:00,787 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2900, loss[loss=0.1411, beats_loss=0.01257, ecapa_loss=0.0007107, whisper_loss=0.1214, over 22150.00 frames. ], tot_loss[loss=0.1317, beats_loss=0.01388, ecapa_loss=0.0006347, whisper_loss=0.1115, over 3865047.83 frames. ], batch size: 91, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:53:13,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.65 vs. limit=22.5 2024-08-09 13:53:39,165 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 13 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 13:53:49,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.34 vs. limit=22.5 2024-08-09 13:53:51,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=29300.0, ans=0.125 2024-08-09 13:53:55,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-09 13:54:01,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2024-08-09 13:54:06,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=29400.0, ans=0.125 2024-08-09 13:54:19,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 2950, loss[loss=0.1387, beats_loss=0.01226, ecapa_loss=0.0005758, whisper_loss=0.1207, over 18854.00 frames. ], tot_loss[loss=0.1322, beats_loss=0.0139, ecapa_loss=0.0006314, whisper_loss=0.112, over 3891251.73 frames. ], batch size: 73, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:54:23,884 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 13:54:35,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2024-08-09 13:54:47,347 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 13:55:03,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=29700.0, ans=0.5 2024-08-09 13:55:39,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 3.111e+01 3.701e+01 4.234e+01 7.297e+01, threshold=7.402e+01, percent-clipped=0.0 2024-08-09 13:55:39,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3000, loss[loss=0.1294, beats_loss=0.01439, ecapa_loss=0.000579, whisper_loss=0.1093, over 15662.00 frames. ], tot_loss[loss=0.1314, beats_loss=0.01391, ecapa_loss=0.0006281, whisper_loss=0.1112, over 3916309.78 frames. ], batch size: 65, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:55:39,236 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 13:56:23,664 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on ASR_libri: loss=0.3107, beats_loss=0, ecapa_loss=0.001585, whisper_loss=0.2948, over 922467.00 frames. 2024-08-09 13:56:41,573 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on SV_voxceleb1: loss=0.0159, beats_loss=0, ecapa_loss=0.00159, whisper_loss=0, over 939242.00 frames. 2024-08-09 13:56:51,100 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6151, 1.3115, 1.5966, 1.6692], device='cuda:3') 2024-08-09 13:58:39,733 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on AT_audioset: loss=0.03327, beats_loss=0.03327, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 13:58:39,737 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 13:58:41,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.31 vs. limit=22.5 2024-08-09 13:58:41,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2024-08-09 13:58:50,088 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.097e+00 2024-08-09 13:58:50,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-09 13:59:04,831 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 13:59:05,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=30100.0, ans=0.2 2024-08-09 13:59:19,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=13.04 vs. limit=10.0 2024-08-09 13:59:35,689 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 13:59:45,255 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 13:59:48,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.31 vs. limit=22.5 2024-08-09 13:59:51,929 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-09 13:59:55,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=30400.0, ans=0.0 2024-08-09 14:00:04,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30500.0, ans=0.125 2024-08-09 14:00:04,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3050, loss[loss=0.1251, beats_loss=0.01122, ecapa_loss=0.0006344, whisper_loss=0.1076, over 15502.00 frames. ], tot_loss[loss=0.1315, beats_loss=0.01388, ecapa_loss=0.0006205, whisper_loss=0.1114, over 3926054.55 frames. ], batch size: 62, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:00:08,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=30500.0, ans=0.2 2024-08-09 14:00:10,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-08-09 14:00:27,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=30600.0, ans=0.125 2024-08-09 14:00:29,307 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-09 14:00:31,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-08-09 14:00:48,607 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 14:00:52,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2024-08-09 14:00:57,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=30800.0, ans=0.035 2024-08-09 14:00:59,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=30800.0, ans=0.004173913043478261 2024-08-09 14:01:07,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30900.0, ans=0.125 2024-08-09 14:01:18,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 3.101e+01 3.734e+01 4.761e+01 9.232e+01, threshold=7.468e+01, percent-clipped=3.0 2024-08-09 14:01:18,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3100, loss[loss=0.1291, beats_loss=0.01447, ecapa_loss=0.0006575, whisper_loss=0.1081, over 21392.00 frames. ], tot_loss[loss=0.1326, beats_loss=0.01388, ecapa_loss=0.0006135, whisper_loss=0.1126, over 3963366.70 frames. ], batch size: 91, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:01:19,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=31000.0, ans=0.125 2024-08-09 14:01:33,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=31100.0, ans=0.2 2024-08-09 14:01:39,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=31100.0, ans=0.035 2024-08-09 14:01:50,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-09 14:01:50,974 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 14:01:55,190 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 14:01:57,463 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-09 14:01:58,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=15.0 2024-08-09 14:02:16,343 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 14:02:19,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=12.0 2024-08-09 14:02:23,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3150, loss[loss=0.1235, beats_loss=0.009825, ecapa_loss=0.0006976, whisper_loss=0.1067, over 19433.00 frames. ], tot_loss[loss=0.1321, beats_loss=0.0138, ecapa_loss=0.0006124, whisper_loss=0.1122, over 3914727.30 frames. ], batch size: 75, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:02:29,181 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 14:02:38,870 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 14:02:40,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=31600.0, ans=0.125 2024-08-09 14:02:46,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=31600.0, ans=0.125 2024-08-09 14:02:48,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=31600.0, ans=0.025 2024-08-09 14:02:49,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=31700.0, ans=0.0 2024-08-09 14:02:58,813 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-09 14:03:01,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=31700.0, ans=0.125 2024-08-09 14:03:11,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=31800.0, ans=0.125 2024-08-09 14:03:15,975 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 14:03:29,204 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 14:03:30,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 3.005e+01 3.440e+01 4.161e+01 7.835e+01, threshold=6.880e+01, percent-clipped=1.0 2024-08-09 14:03:30,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3200, loss[loss=0.1649, beats_loss=0.01118, ecapa_loss=0.0005587, whisper_loss=0.1481, over 23551.00 frames. ], tot_loss[loss=0.1325, beats_loss=0.01375, ecapa_loss=0.0006064, whisper_loss=0.1127, over 3900645.37 frames. ], batch size: 89, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:03:35,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=32000.0, ans=0.1 2024-08-09 14:03:40,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-09 14:03:42,845 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 14:03:45,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=32100.0, ans=0.05 2024-08-09 14:04:06,379 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 14:04:07,646 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-09 14:04:17,085 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 14:04:19,657 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 14:04:36,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3250, loss[loss=0.131, beats_loss=0.01691, ecapa_loss=0.0005966, whisper_loss=0.1081, over 22556.00 frames. ], tot_loss[loss=0.1325, beats_loss=0.01374, ecapa_loss=0.0006039, whisper_loss=0.1128, over 3887513.45 frames. ], batch size: 92, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:05:11,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.28 vs. limit=6.0 2024-08-09 14:05:14,860 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 14:05:42,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.060e+01 3.523e+01 4.253e+01 9.588e+01, threshold=7.047e+01, percent-clipped=8.0 2024-08-09 14:05:42,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3300, loss[loss=0.1183, beats_loss=0.01536, ecapa_loss=0.0006438, whisper_loss=0.0965, over 14765.00 frames. ], tot_loss[loss=0.1325, beats_loss=0.01369, ecapa_loss=0.0006011, whisper_loss=0.1128, over 3870035.89 frames. ], batch size: 62, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:05:50,190 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 14:06:02,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33100.0, ans=0.1 2024-08-09 14:06:37,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=33400.0, ans=0.125 2024-08-09 14:06:41,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=33400.0, ans=0.125 2024-08-09 14:06:44,639 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 14:06:47,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3350, loss[loss=0.09962, beats_loss=0.01445, ecapa_loss=0.0005413, whisper_loss=0.07976, over 18644.00 frames. ], tot_loss[loss=0.1323, beats_loss=0.01363, ecapa_loss=0.0005947, whisper_loss=0.1127, over 3902466.34 frames. ], batch size: 76, lr: 4.30e-02, grad_scale: 64.0 2024-08-09 14:06:51,345 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.690e-03 2024-08-09 14:06:58,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=33500.0, ans=0.05 2024-08-09 14:07:07,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=33600.0, ans=0.0 2024-08-09 14:07:11,816 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 14:07:18,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33700.0, ans=0.125 2024-08-09 14:07:19,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.04 vs. limit=15.0 2024-08-09 14:07:30,450 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-09 14:07:33,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=15.0 2024-08-09 14:07:45,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=33900.0, ans=0.0 2024-08-09 14:07:53,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 3.123e+01 3.529e+01 4.678e+01 1.147e+02, threshold=7.058e+01, percent-clipped=6.0 2024-08-09 14:07:53,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3400, loss[loss=0.1244, beats_loss=0.01373, ecapa_loss=0.0005592, whisper_loss=0.1051, over 20259.00 frames. ], tot_loss[loss=0.1319, beats_loss=0.01353, ecapa_loss=0.0005896, whisper_loss=0.1125, over 3914866.57 frames. ], batch size: 84, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:07:54,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=34000.0, ans=0.125 2024-08-09 14:08:09,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=34100.0, ans=0.125 2024-08-09 14:08:40,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=34300.0, ans=0.125 2024-08-09 14:08:57,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3450, loss[loss=0.0909, beats_loss=0.01787, ecapa_loss=0.0004544, whisper_loss=0.06849, over 14467.00 frames. ], tot_loss[loss=0.1309, beats_loss=0.01355, ecapa_loss=0.0005884, whisper_loss=0.1114, over 3924463.80 frames. ], batch size: 56, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:09:02,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2024-08-09 14:09:12,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=34600.0, ans=0.125 2024-08-09 14:09:14,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.76 vs. limit=10.0 2024-08-09 14:09:17,554 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 14:09:43,442 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 14:09:53,585 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 14:10:02,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.921e+01 3.468e+01 4.313e+01 8.519e+01, threshold=6.936e+01, percent-clipped=1.0 2024-08-09 14:10:02,712 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3500, loss[loss=0.1265, beats_loss=0.01442, ecapa_loss=0.0006532, whisper_loss=0.1055, over 20642.00 frames. ], tot_loss[loss=0.1307, beats_loss=0.01352, ecapa_loss=0.000587, whisper_loss=0.1113, over 3895449.99 frames. ], batch size: 87, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:10:13,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35000.0, ans=0.125 2024-08-09 14:10:19,980 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-09 14:10:31,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=35200.0, ans=0.125 2024-08-09 14:10:36,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35200.0, ans=0.1 2024-08-09 14:11:07,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3550, loss[loss=0.1289, beats_loss=0.01176, ecapa_loss=0.0006669, whisper_loss=0.1105, over 22313.00 frames. ], tot_loss[loss=0.1297, beats_loss=0.01355, ecapa_loss=0.0005815, whisper_loss=0.1103, over 3902666.63 frames. ], batch size: 92, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:11:16,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=35500.0, ans=0.0 2024-08-09 14:11:24,726 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 14:11:27,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35600.0, ans=0.1 2024-08-09 14:11:31,024 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 14:11:31,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=35600.0, ans=0.125 2024-08-09 14:11:39,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=35700.0, ans=0.125 2024-08-09 14:11:39,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2024-08-09 14:11:41,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.92 vs. limit=22.5 2024-08-09 14:11:49,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=35800.0, ans=0.0 2024-08-09 14:11:55,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=35800.0, ans=0.125 2024-08-09 14:12:08,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=35900.0, ans=0.125 2024-08-09 14:12:10,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35900.0, ans=0.1 2024-08-09 14:12:13,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 3.146e+01 3.821e+01 4.721e+01 1.022e+02, threshold=7.642e+01, percent-clipped=5.0 2024-08-09 14:12:13,410 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3600, loss[loss=0.1289, beats_loss=0.01191, ecapa_loss=0.000673, whisper_loss=0.1103, over 22413.00 frames. ], tot_loss[loss=0.1293, beats_loss=0.01353, ecapa_loss=0.0005728, whisper_loss=0.11, over 3877522.53 frames. ], batch size: 92, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:12:21,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.56 vs. limit=22.5 2024-08-09 14:12:26,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-09 14:12:41,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=36200.0, ans=0.125 2024-08-09 14:12:42,809 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 14:12:47,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=36200.0, ans=0.0 2024-08-09 14:12:50,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=36200.0, ans=0.0 2024-08-09 14:12:59,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=36300.0, ans=0.0 2024-08-09 14:13:07,941 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-09 14:13:10,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=36400.0, ans=0.2 2024-08-09 14:13:12,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2024-08-09 14:13:19,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3650, loss[loss=0.1273, beats_loss=0.01328, ecapa_loss=0.0005364, whisper_loss=0.1086, over 22978.00 frames. ], tot_loss[loss=0.1295, beats_loss=0.01349, ecapa_loss=0.0005727, whisper_loss=0.1103, over 3861950.93 frames. ], batch size: 93, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:13:21,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.94 vs. limit=22.5 2024-08-09 14:13:42,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36600.0, ans=0.1 2024-08-09 14:13:55,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=36700.0, ans=0.04949747468305833 2024-08-09 14:14:07,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36800.0, ans=0.1 2024-08-09 14:14:16,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2024-08-09 14:14:19,978 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 14:14:22,487 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-09 14:14:24,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.925e+01 3.373e+01 4.021e+01 6.000e+01, threshold=6.747e+01, percent-clipped=0.0 2024-08-09 14:14:24,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3700, loss[loss=0.1337, beats_loss=0.01159, ecapa_loss=0.0007157, whisper_loss=0.1149, over 17818.00 frames. ], tot_loss[loss=0.1291, beats_loss=0.01351, ecapa_loss=0.0005717, whisper_loss=0.1099, over 3816956.12 frames. ], batch size: 76, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:14:30,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37000.0, ans=0.125 2024-08-09 14:14:33,023 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 14:14:40,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2024-08-09 14:15:02,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37200.0, ans=0.1 2024-08-09 14:15:08,653 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 14:15:14,981 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 14:15:16,564 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 14:15:21,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=37400.0, ans=0.125 2024-08-09 14:15:25,084 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-09 14:15:25,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=37400.0, ans=0.125 2024-08-09 14:15:25,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=37400.0, ans=0.0 2024-08-09 14:15:30,207 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3750, loss[loss=0.1121, beats_loss=0.01383, ecapa_loss=0.0007089, whisper_loss=0.09118, over 15706.00 frames. ], tot_loss[loss=0.1291, beats_loss=0.01365, ecapa_loss=0.0005657, whisper_loss=0.1098, over 3845389.15 frames. ], batch size: 68, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:15:30,339 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 14:15:32,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=37500.0, ans=0.125 2024-08-09 14:15:38,437 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-09 14:15:48,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-08-09 14:15:49,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=37600.0, ans=0.2 2024-08-09 14:15:50,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2024-08-09 14:15:52,249 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 14:15:59,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37700.0, ans=0.1 2024-08-09 14:16:31,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=37900.0, ans=0.125 2024-08-09 14:16:36,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.197e+01 3.801e+01 4.581e+01 9.571e+01, threshold=7.603e+01, percent-clipped=5.0 2024-08-09 14:16:36,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3800, loss[loss=0.1367, beats_loss=0.01359, ecapa_loss=0.0005157, whisper_loss=0.118, over 20841.00 frames. ], tot_loss[loss=0.1278, beats_loss=0.0138, ecapa_loss=0.0005608, whisper_loss=0.1084, over 3853432.13 frames. ], batch size: 82, lr: 4.25e-02, grad_scale: 64.0 2024-08-09 14:16:39,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=38000.0, ans=0.125 2024-08-09 14:16:45,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=38000.0, ans=0.125 2024-08-09 14:16:48,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=38100.0, ans=0.07 2024-08-09 14:17:13,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=38200.0, ans=0.0 2024-08-09 14:17:22,911 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 31 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 14:17:28,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=38400.0, ans=0.125 2024-08-09 14:17:33,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.00 vs. limit=22.5 2024-08-09 14:17:37,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=38400.0, ans=0.125 2024-08-09 14:17:39,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=38400.0, ans=0.0 2024-08-09 14:17:40,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-09 14:17:41,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3850, loss[loss=0.1375, beats_loss=0.01526, ecapa_loss=0.0005178, whisper_loss=0.117, over 22675.00 frames. ], tot_loss[loss=0.1285, beats_loss=0.01377, ecapa_loss=0.0005587, whisper_loss=0.1092, over 3874187.55 frames. ], batch size: 90, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:17:48,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=38500.0, ans=0.125 2024-08-09 14:17:54,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=38600.0, ans=0.002478260869565218 2024-08-09 14:17:57,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=12.0 2024-08-09 14:17:59,730 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 14:18:01,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=38600.0, ans=0.125 2024-08-09 14:18:09,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2024-08-09 14:18:25,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38800.0, ans=0.1 2024-08-09 14:18:49,356 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 3.021e+01 3.699e+01 4.570e+01 7.428e+01, threshold=7.398e+01, percent-clipped=0.0 2024-08-09 14:18:49,377 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3900, loss[loss=0.1333, beats_loss=0.01462, ecapa_loss=0.0005439, whisper_loss=0.1133, over 15326.00 frames. ], tot_loss[loss=0.1286, beats_loss=0.01376, ecapa_loss=0.0005604, whisper_loss=0.1093, over 3882415.24 frames. ], batch size: 59, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:18:57,237 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 14:19:02,587 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 14:19:12,360 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 14:19:16,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=39200.0, ans=0.125 2024-08-09 14:19:46,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=39400.0, ans=0.025 2024-08-09 14:19:50,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=39400.0, ans=0.0 2024-08-09 14:19:53,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=39500.0, ans=0.0 2024-08-09 14:19:53,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 3950, loss[loss=0.1356, beats_loss=0.01477, ecapa_loss=0.0005521, whisper_loss=0.1153, over 21994.00 frames. ], tot_loss[loss=0.1286, beats_loss=0.01372, ecapa_loss=0.0005582, whisper_loss=0.1093, over 3907703.14 frames. ], batch size: 91, lr: 4.23e-02, grad_scale: 64.0 2024-08-09 14:19:58,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=39500.0, ans=0.125 2024-08-09 14:20:12,606 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-09 14:20:17,832 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 14:20:22,525 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.696e-01 2024-08-09 14:20:25,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=39700.0, ans=0.2 2024-08-09 14:20:43,997 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-09 14:20:53,073 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 14:20:55,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-09 14:20:57,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=39900.0, ans=0.2 2024-08-09 14:21:01,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=39900.0, ans=0.125 2024-08-09 14:21:12,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.104e+01 3.769e+01 4.628e+01 7.300e+01, threshold=7.538e+01, percent-clipped=0.0 2024-08-09 14:21:12,652 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4000, loss[loss=0.1092, beats_loss=0.01384, ecapa_loss=0.0006355, whisper_loss=0.08897, over 18286.00 frames. ], tot_loss[loss=0.128, beats_loss=0.01378, ecapa_loss=0.0005562, whisper_loss=0.1086, over 3883200.50 frames. ], batch size: 79, lr: 4.23e-02, grad_scale: 128.0 2024-08-09 14:21:17,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=12.0 2024-08-09 14:21:19,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=40000.0, ans=0.2 2024-08-09 14:21:37,475 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-09 14:21:43,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=40200.0, ans=0.125 2024-08-09 14:22:15,734 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.618e-01 2024-08-09 14:22:20,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40400.0, ans=0.0 2024-08-09 14:22:23,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4050, loss[loss=0.1274, beats_loss=0.01299, ecapa_loss=0.0004635, whisper_loss=0.1097, over 19379.00 frames. ], tot_loss[loss=0.1282, beats_loss=0.01376, ecapa_loss=0.0005504, whisper_loss=0.1089, over 3899380.62 frames. ], batch size: 73, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:22:31,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=40500.0, ans=0.05 2024-08-09 14:22:34,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=40500.0, ans=0.125 2024-08-09 14:22:36,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=40600.0, ans=0.0 2024-08-09 14:22:37,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2024-08-09 14:22:42,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=40600.0, ans=0.125 2024-08-09 14:22:45,253 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 14:22:48,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=40700.0, ans=0.025 2024-08-09 14:22:54,640 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 14:22:54,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=40700.0, ans=0.0 2024-08-09 14:23:00,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-09 14:23:06,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=40800.0, ans=0.07 2024-08-09 14:23:12,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=40800.0, ans=0.125 2024-08-09 14:23:15,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=40900.0, ans=0.0 2024-08-09 14:23:19,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=40900.0, ans=0.0 2024-08-09 14:23:28,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.975e+01 3.511e+01 4.257e+01 6.601e+01, threshold=7.021e+01, percent-clipped=0.0 2024-08-09 14:23:28,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4100, loss[loss=0.09404, beats_loss=0.01348, ecapa_loss=0.0007191, whisper_loss=0.07337, over 16686.00 frames. ], tot_loss[loss=0.1282, beats_loss=0.01365, ecapa_loss=0.0005465, whisper_loss=0.1091, over 3897847.42 frames. ], batch size: 73, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:23:36,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.89 vs. limit=22.5 2024-08-09 14:24:16,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=41300.0, ans=0.5 2024-08-09 14:24:18,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=41300.0, ans=0.2 2024-08-09 14:24:25,420 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-09 14:24:33,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4150, loss[loss=0.1218, beats_loss=0.01341, ecapa_loss=0.0004633, whisper_loss=0.1038, over 21807.00 frames. ], tot_loss[loss=0.1272, beats_loss=0.01358, ecapa_loss=0.0005473, whisper_loss=0.1082, over 3877243.73 frames. ], batch size: 83, lr: 4.21e-02, grad_scale: 128.0 2024-08-09 14:24:42,453 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 14:25:13,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2024-08-09 14:25:16,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41800.0, ans=0.1 2024-08-09 14:25:27,578 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 14:25:37,464 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.939e+01 3.388e+01 4.308e+01 6.716e+01, threshold=6.777e+01, percent-clipped=0.0 2024-08-09 14:25:37,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4200, loss[loss=0.1171, beats_loss=0.01255, ecapa_loss=0.0005863, whisper_loss=0.0987, over 16090.00 frames. ], tot_loss[loss=0.1269, beats_loss=0.01361, ecapa_loss=0.0005448, whisper_loss=0.1078, over 3861910.56 frames. ], batch size: 68, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:26:10,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-08-09 14:26:28,401 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 14:26:36,967 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 14:26:37,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=42400.0, ans=0.125 2024-08-09 14:26:41,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4250, loss[loss=0.1123, beats_loss=0.01673, ecapa_loss=0.0004916, whisper_loss=0.09069, over 21610.00 frames. ], tot_loss[loss=0.1265, beats_loss=0.01367, ecapa_loss=0.0005376, whisper_loss=0.1075, over 3871993.22 frames. ], batch size: 87, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:26:55,191 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 14:27:00,607 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:27:23,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=42800.0, ans=0.2 2024-08-09 14:27:29,825 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 14:27:35,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=42900.0, ans=0.0 2024-08-09 14:27:36,101 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 14:27:40,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=42900.0, ans=0.0 2024-08-09 14:27:46,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 3.006e+01 3.697e+01 4.408e+01 8.760e+01, threshold=7.393e+01, percent-clipped=1.0 2024-08-09 14:27:46,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4300, loss[loss=0.1368, beats_loss=0.01309, ecapa_loss=0.0004702, whisper_loss=0.119, over 24124.00 frames. ], tot_loss[loss=0.1261, beats_loss=0.0137, ecapa_loss=0.0005335, whisper_loss=0.1071, over 3883966.64 frames. ], batch size: 93, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:28:15,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=43200.0, ans=0.001478260869565217 2024-08-09 14:28:17,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=43200.0, ans=0.125 2024-08-09 14:28:18,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=43200.0, ans=0.125 2024-08-09 14:28:33,745 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 14:28:38,720 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 14:28:44,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2024-08-09 14:28:49,218 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 14:28:51,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4350, loss[loss=0.141, beats_loss=0.01037, ecapa_loss=0.0005781, whisper_loss=0.1249, over 22937.00 frames. ], tot_loss[loss=0.1265, beats_loss=0.0135, ecapa_loss=0.0005332, whisper_loss=0.1077, over 3863419.26 frames. ], batch size: 91, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:29:03,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=43600.0, ans=0.125 2024-08-09 14:29:09,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43600.0, ans=0.1 2024-08-09 14:29:15,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=43600.0, ans=0.125 2024-08-09 14:29:21,420 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-09 14:29:21,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=43700.0, ans=0.02 2024-08-09 14:29:23,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=22.5 2024-08-09 14:29:48,622 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 14:29:48,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=43900.0, ans=0.2 2024-08-09 14:29:57,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.950e+01 3.412e+01 4.173e+01 7.476e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-09 14:29:57,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4400, loss[loss=0.1445, beats_loss=0.01474, ecapa_loss=0.0004875, whisper_loss=0.1248, over 15306.00 frames. ], tot_loss[loss=0.1271, beats_loss=0.01344, ecapa_loss=0.0005299, whisper_loss=0.1084, over 3863186.45 frames. ], batch size: 58, lr: 4.18e-02, grad_scale: 128.0 2024-08-09 14:29:59,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-08-09 14:30:01,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=44000.0, ans=0.05 2024-08-09 14:30:03,529 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 14:30:05,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=44000.0, ans=0.125 2024-08-09 14:30:13,247 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 38 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 14:30:14,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=44100.0, ans=0.0 2024-08-09 14:30:21,142 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 14:30:33,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=44200.0, ans=0.125 2024-08-09 14:30:53,440 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-09 14:31:05,416 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 14:31:21,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-09 14:31:22,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4450, loss[loss=0.1245, beats_loss=0.01252, ecapa_loss=0.0005033, whisper_loss=0.107, over 19359.00 frames. ], tot_loss[loss=0.1276, beats_loss=0.01341, ecapa_loss=0.0005273, whisper_loss=0.109, over 3894371.10 frames. ], batch size: 77, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:31:23,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-08-09 14:31:29,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=44500.0, ans=0.1 2024-08-09 14:32:06,376 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 14:32:35,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=44900.0, ans=0.125 2024-08-09 14:32:44,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44900.0, ans=0.1 2024-08-09 14:32:47,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=45000.0, ans=0.125 2024-08-09 14:32:48,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+01 3.039e+01 3.733e+01 4.656e+01 8.279e+01, threshold=7.465e+01, percent-clipped=2.0 2024-08-09 14:32:48,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4500, loss[loss=0.1253, beats_loss=0.01361, ecapa_loss=0.0005823, whisper_loss=0.1059, over 21812.00 frames. ], tot_loss[loss=0.1277, beats_loss=0.01344, ecapa_loss=0.0005241, whisper_loss=0.109, over 3888149.40 frames. ], batch size: 91, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:32:55,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45000.0, ans=0.1 2024-08-09 14:33:13,733 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 14:33:40,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=15.0 2024-08-09 14:33:45,194 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 14:33:54,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2024-08-09 14:34:11,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4550, loss[loss=0.08465, beats_loss=0.01477, ecapa_loss=0.0004632, whisper_loss=0.06525, over 15578.00 frames. ], tot_loss[loss=0.1262, beats_loss=0.01357, ecapa_loss=0.0005221, whisper_loss=0.1074, over 3869976.97 frames. ], batch size: 62, lr: 4.16e-02, grad_scale: 128.0 2024-08-09 14:34:18,398 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 14:34:23,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=45500.0, ans=0.0009782608695652183 2024-08-09 14:34:26,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=45600.0, ans=0.125 2024-08-09 14:34:26,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=45600.0, ans=0.125 2024-08-09 14:34:30,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=45600.0, ans=0.125 2024-08-09 14:35:00,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=45800.0, ans=0.0 2024-08-09 14:35:02,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=45800.0, ans=0.5 2024-08-09 14:35:10,526 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 14:35:32,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.954e+01 3.369e+01 4.036e+01 7.171e+01, threshold=6.737e+01, percent-clipped=0.0 2024-08-09 14:35:32,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4600, loss[loss=0.1387, beats_loss=0.01118, ecapa_loss=0.0005003, whisper_loss=0.1225, over 22794.00 frames. ], tot_loss[loss=0.1261, beats_loss=0.01355, ecapa_loss=0.0005196, whisper_loss=0.1073, over 3847902.86 frames. ], batch size: 89, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:35:33,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=46000.0, ans=0.125 2024-08-09 14:35:35,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.02 vs. limit=10.0 2024-08-09 14:35:46,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=46000.0, ans=0.125 2024-08-09 14:35:55,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=46100.0, ans=0.0 2024-08-09 14:35:57,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=46100.0, ans=0.125 2024-08-09 14:36:17,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-09 14:36:49,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=46400.0, ans=0.125 2024-08-09 14:36:54,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4650, loss[loss=0.1176, beats_loss=0.01088, ecapa_loss=0.0006806, whisper_loss=0.09987, over 15156.00 frames. ], tot_loss[loss=0.1266, beats_loss=0.01355, ecapa_loss=0.000522, whisper_loss=0.1078, over 3861964.19 frames. ], batch size: 62, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:37:24,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2024-08-09 14:37:59,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=46800.0, ans=0.125 2024-08-09 14:38:02,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=46900.0, ans=0.0 2024-08-09 14:38:17,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 3.046e+01 3.609e+01 4.617e+01 7.306e+01, threshold=7.217e+01, percent-clipped=2.0 2024-08-09 14:38:17,895 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4700, loss[loss=0.1324, beats_loss=0.0122, ecapa_loss=0.0005115, whisper_loss=0.115, over 23150.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.0136, ecapa_loss=0.0005158, whisper_loss=0.1076, over 3836138.19 frames. ], batch size: 94, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:38:49,713 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 14:39:06,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2024-08-09 14:39:18,520 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 14:39:18,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=47300.0, ans=0.0 2024-08-09 14:39:27,931 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-09 14:39:28,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=47400.0, ans=0.125 2024-08-09 14:39:30,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=15.0 2024-08-09 14:39:39,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=59.60 vs. limit=22.5 2024-08-09 14:39:43,283 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4750, loss[loss=0.1398, beats_loss=0.01237, ecapa_loss=0.0004593, whisper_loss=0.1228, over 23181.00 frames. ], tot_loss[loss=0.1267, beats_loss=0.01362, ecapa_loss=0.0005156, whisper_loss=0.1079, over 3850374.39 frames. ], batch size: 89, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:39:47,338 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 14:40:09,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2024-08-09 14:40:36,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47800.0, ans=0.1 2024-08-09 14:40:39,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=47800.0, ans=0.0 2024-08-09 14:40:41,437 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 14:41:04,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=48000.0, ans=0.0 2024-08-09 14:41:05,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.158e+01 3.572e+01 4.344e+01 1.074e+02, threshold=7.144e+01, percent-clipped=1.0 2024-08-09 14:41:05,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4800, loss[loss=0.1152, beats_loss=0.01468, ecapa_loss=0.0004761, whisper_loss=0.09578, over 21950.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01368, ecapa_loss=0.0005134, whisper_loss=0.1072, over 3867776.79 frames. ], batch size: 87, lr: 4.13e-02, grad_scale: 128.0 2024-08-09 14:41:12,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.23 vs. limit=10.0 2024-08-09 14:41:45,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.27 vs. limit=10.0 2024-08-09 14:41:55,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-08-09 14:42:02,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=48300.0, ans=0.0 2024-08-09 14:42:17,364 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.258e-01 2024-08-09 14:42:27,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2024-08-09 14:42:31,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4850, loss[loss=0.123, beats_loss=0.01114, ecapa_loss=0.000584, whisper_loss=0.106, over 17503.00 frames. ], tot_loss[loss=0.1259, beats_loss=0.01372, ecapa_loss=0.0005119, whisper_loss=0.1071, over 3890534.82 frames. ], batch size: 72, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:42:49,548 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-09 14:43:08,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=48700.0, ans=0.125 2024-08-09 14:44:00,246 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.275e+01 3.682e+01 4.305e+01 7.376e+01, threshold=7.365e+01, percent-clipped=1.0 2024-08-09 14:44:00,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4900, loss[loss=0.1243, beats_loss=0.01441, ecapa_loss=0.0004335, whisper_loss=0.1055, over 22265.00 frames. ], tot_loss[loss=0.1259, beats_loss=0.01381, ecapa_loss=0.0005092, whisper_loss=0.107, over 3914353.44 frames. ], batch size: 88, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:44:26,253 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 14:44:35,314 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 14:44:39,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=49200.0, ans=0.2 2024-08-09 14:44:42,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=12.0 2024-08-09 14:44:52,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=49300.0, ans=0.07 2024-08-09 14:45:07,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2024-08-09 14:45:26,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 4950, loss[loss=0.1399, beats_loss=0.01167, ecapa_loss=0.0005196, whisper_loss=0.123, over 14094.00 frames. ], tot_loss[loss=0.1258, beats_loss=0.01378, ecapa_loss=0.0005078, whisper_loss=0.1069, over 3911366.49 frames. ], batch size: 56, lr: 4.11e-02, grad_scale: 128.0 2024-08-09 14:45:32,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=49500.0, ans=0.5 2024-08-09 14:46:01,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=49700.0, ans=0.2 2024-08-09 14:46:13,150 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 14:46:20,528 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 14:46:52,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+01 3.042e+01 3.499e+01 4.372e+01 7.194e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-09 14:46:52,325 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5000, loss[loss=0.1121, beats_loss=0.01307, ecapa_loss=0.0005244, whisper_loss=0.09383, over 18402.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01374, ecapa_loss=0.0005065, whisper_loss=0.1066, over 3881807.68 frames. ], batch size: 78, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:46:56,953 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:47:02,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=50000.0, ans=0.07 2024-08-09 14:47:11,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=50100.0, ans=0.0 2024-08-09 14:47:16,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=50100.0, ans=0.2 2024-08-09 14:47:21,493 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.306e+00 2024-08-09 14:47:41,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=50300.0, ans=0.2 2024-08-09 14:47:51,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=50300.0, ans=0.0 2024-08-09 14:47:54,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-08-09 14:47:55,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-09 14:47:56,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2024-08-09 14:48:04,963 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 14:48:08,711 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5050, loss[loss=0.1282, beats_loss=0.01247, ecapa_loss=0.0004446, whisper_loss=0.1113, over 18144.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01373, ecapa_loss=0.0005036, whisper_loss=0.1072, over 3912109.86 frames. ], batch size: 69, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:48:29,299 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 14:48:33,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=50600.0, ans=0.0 2024-08-09 14:48:36,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-09 14:48:38,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2024-08-09 14:48:39,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-09 14:48:47,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=15.0 2024-08-09 14:48:48,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=50800.0, ans=0.125 2024-08-09 14:48:57,965 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 14:49:02,453 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 14:49:05,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=50900.0, ans=0.0 2024-08-09 14:49:09,022 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 14:49:15,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 3.052e+01 3.532e+01 4.388e+01 7.103e+01, threshold=7.064e+01, percent-clipped=2.0 2024-08-09 14:49:15,354 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5100, loss[loss=0.1067, beats_loss=0.01601, ecapa_loss=0.0005075, whisper_loss=0.08563, over 18742.00 frames. ], tot_loss[loss=0.1265, beats_loss=0.01367, ecapa_loss=0.0005014, whisper_loss=0.1079, over 3901775.51 frames. ], batch size: 79, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:49:18,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.54 vs. limit=15.0 2024-08-09 14:49:27,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=51100.0, ans=0.0 2024-08-09 14:49:28,394 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 14:49:32,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=51100.0, ans=0.125 2024-08-09 14:49:59,774 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 14:50:00,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=51300.0, ans=0.05 2024-08-09 14:50:08,798 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 14:50:09,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=51400.0, ans=0.0 2024-08-09 14:50:18,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=51400.0, ans=0.125 2024-08-09 14:50:20,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5150, loss[loss=0.1308, beats_loss=0.01283, ecapa_loss=0.0004741, whisper_loss=0.1132, over 22830.00 frames. ], tot_loss[loss=0.1267, beats_loss=0.01355, ecapa_loss=0.0005002, whisper_loss=0.1082, over 3891999.67 frames. ], batch size: 91, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:50:31,417 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 14:50:39,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=51600.0, ans=0.2 2024-08-09 14:50:39,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=15.0 2024-08-09 14:50:45,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=51700.0, ans=0.0 2024-08-09 14:50:48,204 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 14:50:49,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=51700.0, ans=0.125 2024-08-09 14:51:10,595 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 14:51:17,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=15.0 2024-08-09 14:51:25,024 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.954e+01 3.465e+01 4.225e+01 6.973e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 14:51:25,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5200, loss[loss=0.1183, beats_loss=0.01262, ecapa_loss=0.0005365, whisper_loss=0.1003, over 21504.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.01359, ecapa_loss=0.0004944, whisper_loss=0.1079, over 3904334.05 frames. ], batch size: 89, lr: 4.08e-02, grad_scale: 128.0 2024-08-09 14:51:30,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=52000.0, ans=0.1 2024-08-09 14:51:36,625 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 14:52:10,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-08-09 14:52:12,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=52300.0, ans=0.0 2024-08-09 14:52:28,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5250, loss[loss=0.1166, beats_loss=0.01302, ecapa_loss=0.0004807, whisper_loss=0.09875, over 17210.00 frames. ], tot_loss[loss=0.1257, beats_loss=0.01353, ecapa_loss=0.0004927, whisper_loss=0.1072, over 3881264.43 frames. ], batch size: 69, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:52:44,615 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 14:52:45,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.09 vs. limit=22.5 2024-08-09 14:52:54,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=52700.0, ans=10.0 2024-08-09 14:53:20,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=52900.0, ans=0.125 2024-08-09 14:53:22,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=52900.0, ans=10.0 2024-08-09 14:53:33,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.986e+01 3.430e+01 3.984e+01 5.910e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 14:53:33,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5300, loss[loss=0.1053, beats_loss=0.01473, ecapa_loss=0.0005811, whisper_loss=0.08477, over 21682.00 frames. ], tot_loss[loss=0.1253, beats_loss=0.01353, ecapa_loss=0.0004902, whisper_loss=0.1069, over 3863198.57 frames. ], batch size: 94, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:53:38,663 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 14:53:41,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-09 14:53:45,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2024-08-09 14:54:01,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=53200.0, ans=0.125 2024-08-09 14:54:03,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.48 vs. limit=22.5 2024-08-09 14:54:04,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=53200.0, ans=0.0 2024-08-09 14:54:12,245 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:54:21,512 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 14:54:32,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=53400.0, ans=0.95 2024-08-09 14:54:38,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5350, loss[loss=0.1189, beats_loss=0.0148, ecapa_loss=0.000574, whisper_loss=0.09838, over 12998.00 frames. ], tot_loss[loss=0.1258, beats_loss=0.01343, ecapa_loss=0.0004897, whisper_loss=0.1074, over 3854772.72 frames. ], batch size: 55, lr: 4.06e-02, grad_scale: 128.0 2024-08-09 14:54:43,751 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 14:54:44,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2024-08-09 14:54:50,213 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 14:55:06,971 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 14:55:12,281 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-09 14:55:19,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2024-08-09 14:55:21,511 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 14:55:34,224 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 14:55:43,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.073e+01 3.494e+01 4.285e+01 8.308e+01, threshold=6.988e+01, percent-clipped=2.0 2024-08-09 14:55:43,784 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5400, loss[loss=0.1258, beats_loss=0.01257, ecapa_loss=0.0004944, whisper_loss=0.1082, over 22828.00 frames. ], tot_loss[loss=0.1262, beats_loss=0.01343, ecapa_loss=0.0004867, whisper_loss=0.1079, over 3838524.83 frames. ], batch size: 91, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:55:44,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=54000.0, ans=0.05 2024-08-09 14:55:51,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=54000.0, ans=0.125 2024-08-09 14:55:52,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-08-09 14:55:57,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=54100.0, ans=0.0 2024-08-09 14:56:01,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=54100.0, ans=0.125 2024-08-09 14:56:08,355 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 14:56:12,097 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 11 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 14:56:30,130 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 14:56:36,780 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 14:56:41,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=54400.0, ans=0.125 2024-08-09 14:56:44,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-09 14:56:47,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5450, loss[loss=0.1242, beats_loss=0.01504, ecapa_loss=0.0004774, whisper_loss=0.1043, over 18811.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01346, ecapa_loss=0.0004857, whisper_loss=0.1077, over 3842198.29 frames. ], batch size: 80, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:57:10,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=54600.0, ans=0.125 2024-08-09 14:57:14,075 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 14:57:20,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=54700.0, ans=0.1 2024-08-09 14:57:31,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=54800.0, ans=15.0 2024-08-09 14:57:41,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=54900.0, ans=0.125 2024-08-09 14:57:46,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-09 14:57:51,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 3.087e+01 3.659e+01 4.293e+01 7.884e+01, threshold=7.318e+01, percent-clipped=2.0 2024-08-09 14:57:51,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5500, loss[loss=0.1246, beats_loss=0.01579, ecapa_loss=0.0003917, whisper_loss=0.1049, over 17119.00 frames. ], tot_loss[loss=0.126, beats_loss=0.01348, ecapa_loss=0.000481, whisper_loss=0.1078, over 3876066.15 frames. ], batch size: 66, lr: 4.04e-02, grad_scale: 128.0 2024-08-09 14:58:07,526 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 14:58:17,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2024-08-09 14:58:18,857 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-09 14:58:43,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2024-08-09 14:58:53,116 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 14:58:55,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5550, loss[loss=0.119, beats_loss=0.01535, ecapa_loss=0.0004778, whisper_loss=0.09891, over 13154.00 frames. ], tot_loss[loss=0.1266, beats_loss=0.01343, ecapa_loss=0.0004835, whisper_loss=0.1083, over 3854748.27 frames. ], batch size: 54, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 14:59:09,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=55600.0, ans=0.125 2024-08-09 14:59:14,971 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-09 14:59:30,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2024-08-09 14:59:32,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=55800.0, ans=0.125 2024-08-09 14:59:43,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=55800.0, ans=0.125 2024-08-09 14:59:50,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=55900.0, ans=0.125 2024-08-09 14:59:53,003 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 14:59:59,655 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.194e+01 3.634e+01 4.385e+01 7.525e+01, threshold=7.268e+01, percent-clipped=1.0 2024-08-09 14:59:59,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5600, loss[loss=0.1268, beats_loss=0.01345, ecapa_loss=0.0004751, whisper_loss=0.1086, over 22982.00 frames. ], tot_loss[loss=0.1258, beats_loss=0.01352, ecapa_loss=0.000482, whisper_loss=0.1074, over 3870721.40 frames. ], batch size: 92, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 15:00:00,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2024-08-09 15:00:07,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.00 vs. limit=22.5 2024-08-09 15:00:19,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-09 15:00:25,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=22.5 2024-08-09 15:00:35,397 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 15:00:45,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=56300.0, ans=15.0 2024-08-09 15:00:50,955 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 15:00:52,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=15.0 2024-08-09 15:01:03,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5650, loss[loss=0.1183, beats_loss=0.01438, ecapa_loss=0.0005304, whisper_loss=0.09859, over 21329.00 frames. ], tot_loss[loss=0.1257, beats_loss=0.01349, ecapa_loss=0.0004836, whisper_loss=0.1074, over 3896547.35 frames. ], batch size: 88, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:01:08,976 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 15:01:13,203 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 15:01:19,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=56600.0, ans=0.0 2024-08-09 15:01:36,350 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 15:01:37,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.97 vs. limit=22.5 2024-08-09 15:01:39,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-08-09 15:01:40,440 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 15:01:58,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.34 vs. limit=10.0 2024-08-09 15:02:03,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=56900.0, ans=0.125 2024-08-09 15:02:08,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.137e+01 3.741e+01 4.572e+01 6.525e+01, threshold=7.481e+01, percent-clipped=0.0 2024-08-09 15:02:08,387 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5700, loss[loss=0.1331, beats_loss=0.01254, ecapa_loss=0.0004083, whisper_loss=0.1165, over 17471.00 frames. ], tot_loss[loss=0.1253, beats_loss=0.01353, ecapa_loss=0.0004845, whisper_loss=0.107, over 3911761.64 frames. ], batch size: 66, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:02:20,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=57100.0, ans=0.0 2024-08-09 15:02:24,005 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 15:02:29,595 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-09 15:02:34,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-08-09 15:02:43,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=57200.0, ans=10.0 2024-08-09 15:02:51,354 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 15:02:56,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=57300.0, ans=0.0 2024-08-09 15:03:12,220 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 15:03:12,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=57500.0, ans=0.0 2024-08-09 15:03:13,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5750, loss[loss=0.1106, beats_loss=0.01536, ecapa_loss=0.0004287, whisper_loss=0.09092, over 22368.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01357, ecapa_loss=0.0004808, whisper_loss=0.1066, over 3914803.73 frames. ], batch size: 93, lr: 4.01e-02, grad_scale: 128.0 2024-08-09 15:03:13,383 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-09 15:03:27,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=57600.0, ans=0.1 2024-08-09 15:03:32,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2024-08-09 15:03:36,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57600.0, ans=0.1 2024-08-09 15:03:37,042 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 29 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 15:03:37,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=57600.0, ans=0.125 2024-08-09 15:03:53,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=57800.0, ans=0.125 2024-08-09 15:04:04,403 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 15:04:18,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.931e+01 3.260e+01 3.924e+01 8.527e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-09 15:04:18,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5800, loss[loss=0.1544, beats_loss=0.009873, ecapa_loss=0.0004946, whisper_loss=0.1396, over 14828.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01347, ecapa_loss=0.0004819, whisper_loss=0.1061, over 3883401.64 frames. ], batch size: 55, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:04:32,887 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 15:04:34,110 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-09 15:04:40,515 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 15:04:55,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=58200.0, ans=0.125 2024-08-09 15:04:56,599 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 15:05:23,217 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 15:05:24,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=58500.0, ans=0.0 2024-08-09 15:05:25,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5850, loss[loss=0.1124, beats_loss=0.0138, ecapa_loss=0.0005036, whisper_loss=0.0936, over 14296.00 frames. ], tot_loss[loss=0.1246, beats_loss=0.01358, ecapa_loss=0.0004795, whisper_loss=0.1062, over 3909294.93 frames. ], batch size: 58, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:05:25,721 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 15:05:37,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.46 vs. limit=10.0 2024-08-09 15:05:38,212 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 15:05:39,516 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-09 15:05:39,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=58600.0, ans=0.1 2024-08-09 15:05:40,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2024-08-09 15:05:44,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-09 15:05:48,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.47 vs. limit=10.0 2024-08-09 15:05:54,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=58700.0, ans=0.125 2024-08-09 15:05:59,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=58700.0, ans=0.0 2024-08-09 15:06:34,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 3.149e+01 3.698e+01 4.735e+01 7.316e+01, threshold=7.396e+01, percent-clipped=3.0 2024-08-09 15:06:34,160 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5900, loss[loss=0.1361, beats_loss=0.01075, ecapa_loss=0.0004756, whisper_loss=0.1206, over 13928.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.0136, ecapa_loss=0.0004793, whisper_loss=0.1063, over 3903351.45 frames. ], batch size: 54, lr: 3.99e-02, grad_scale: 128.0 2024-08-09 15:06:37,042 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 15:06:44,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=15.0 2024-08-09 15:06:45,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=59000.0, ans=0.0 2024-08-09 15:06:54,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59100.0, ans=0.1 2024-08-09 15:07:18,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=59300.0, ans=0.0 2024-08-09 15:07:25,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=59300.0, ans=0.125 2024-08-09 15:07:28,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-08-09 15:07:28,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=59400.0, ans=0.125 2024-08-09 15:07:40,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 5950, loss[loss=0.06824, beats_loss=0.01403, ecapa_loss=0.000477, whisper_loss=0.04944, over 14167.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01354, ecapa_loss=0.0004774, whisper_loss=0.1067, over 3896781.85 frames. ], batch size: 57, lr: 3.98e-02, grad_scale: 128.0 2024-08-09 15:07:46,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=59500.0, ans=0.125 2024-08-09 15:08:01,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=59600.0, ans=0.0 2024-08-09 15:08:04,995 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 15:08:17,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=59700.0, ans=0.04949747468305833 2024-08-09 15:08:21,708 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 15:08:24,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=59800.0, ans=0.0 2024-08-09 15:08:52,068 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.855e+01 3.241e+01 4.234e+01 7.891e+01, threshold=6.482e+01, percent-clipped=2.0 2024-08-09 15:08:52,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6000, loss[loss=0.1198, beats_loss=0.0151, ecapa_loss=0.0004349, whisper_loss=0.1004, over 22105.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01354, ecapa_loss=0.000474, whisper_loss=0.1061, over 3886818.22 frames. ], batch size: 91, lr: 3.98e-02, grad_scale: 256.0 2024-08-09 15:08:52,090 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 15:09:28,575 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on ASR_libri: loss=0.2951, beats_loss=0, ecapa_loss=0.001297, whisper_loss=0.2822, over 922467.00 frames. 2024-08-09 15:09:46,148 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on SV_voxceleb1: loss=0.01236, beats_loss=0, ecapa_loss=0.001236, whisper_loss=0, over 939242.00 frames. 2024-08-09 15:10:33,620 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8933, 4.4501, 4.4697, 3.9297], device='cuda:3') 2024-08-09 15:11:04,651 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0384, 3.9621, 3.8757, 4.0394], device='cuda:3') 2024-08-09 15:11:29,843 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on AT_audioset: loss=0.03246, beats_loss=0.03246, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 15:11:29,847 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 15:11:31,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=60000.0, ans=0.2 2024-08-09 15:11:40,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=60000.0, ans=0.125 2024-08-09 15:11:43,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=60100.0, ans=0.2 2024-08-09 15:11:50,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2024-08-09 15:11:59,361 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 15:12:33,873 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 15:12:44,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6050, loss[loss=0.1356, beats_loss=0.01255, ecapa_loss=0.000534, whisper_loss=0.1177, over 20739.00 frames. ], tot_loss[loss=0.1245, beats_loss=0.01349, ecapa_loss=0.0004729, whisper_loss=0.1063, over 3889651.37 frames. ], batch size: 86, lr: 3.97e-02, grad_scale: 256.0 2024-08-09 15:12:53,730 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 15:13:09,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=15.0 2024-08-09 15:13:25,721 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=15.0 2024-08-09 15:13:31,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.45 vs. limit=22.5 2024-08-09 15:13:49,470 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.500e+00 2024-08-09 15:13:55,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=60900.0, ans=0.0 2024-08-09 15:13:59,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.011e+01 3.542e+01 4.337e+01 6.873e+01, threshold=7.084e+01, percent-clipped=1.0 2024-08-09 15:13:59,532 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6100, loss[loss=0.1348, beats_loss=0.01087, ecapa_loss=0.0005409, whisper_loss=0.1185, over 20962.00 frames. ], tot_loss[loss=0.1245, beats_loss=0.01344, ecapa_loss=0.0004707, whisper_loss=0.1064, over 3869896.86 frames. ], batch size: 86, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:14:07,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=61000.0, ans=0.0 2024-08-09 15:14:16,218 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:14:20,427 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 15:14:41,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=61200.0, ans=0.0 2024-08-09 15:14:52,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-09 15:14:53,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=61300.0, ans=0.125 2024-08-09 15:14:59,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=61400.0, ans=0.0 2024-08-09 15:15:13,763 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6150, loss[loss=0.1273, beats_loss=0.01267, ecapa_loss=0.0004786, whisper_loss=0.1098, over 22805.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01334, ecapa_loss=0.0004713, whisper_loss=0.1069, over 3887623.16 frames. ], batch size: 92, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:15:29,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2024-08-09 15:15:34,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=61600.0, ans=0.09899494936611666 2024-08-09 15:15:37,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=61600.0, ans=0.2 2024-08-09 15:15:41,487 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 15:15:57,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=61800.0, ans=0.125 2024-08-09 15:16:10,036 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 14 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 15:16:12,782 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 15:16:21,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=61900.0, ans=0.125 2024-08-09 15:16:27,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.116e+01 3.579e+01 4.385e+01 6.920e+01, threshold=7.157e+01, percent-clipped=0.0 2024-08-09 15:16:27,995 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6200, loss[loss=0.1147, beats_loss=0.0122, ecapa_loss=0.0004834, whisper_loss=0.09768, over 21122.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01333, ecapa_loss=0.0004717, whisper_loss=0.1069, over 3893353.98 frames. ], batch size: 83, lr: 3.95e-02, grad_scale: 256.0 2024-08-09 15:17:07,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-09 15:17:11,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=62300.0, ans=0.125 2024-08-09 15:17:13,848 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 15:17:20,957 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 15:17:43,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6250, loss[loss=0.1285, beats_loss=0.01344, ecapa_loss=0.0003956, whisper_loss=0.1111, over 16129.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01338, ecapa_loss=0.0004703, whisper_loss=0.1067, over 3870255.64 frames. ], batch size: 62, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:17:47,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=19.91 vs. limit=15.0 2024-08-09 15:18:02,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=62600.0, ans=0.0 2024-08-09 15:18:14,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=62700.0, ans=0.0 2024-08-09 15:18:19,900 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-09 15:18:23,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=62700.0, ans=0.125 2024-08-09 15:18:24,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=62700.0, ans=0.2 2024-08-09 15:18:36,377 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 15:18:55,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=62900.0, ans=0.1 2024-08-09 15:19:00,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.965e+01 3.406e+01 4.255e+01 1.028e+02, threshold=6.812e+01, percent-clipped=2.0 2024-08-09 15:19:00,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6300, loss[loss=0.1336, beats_loss=0.01377, ecapa_loss=0.0005154, whisper_loss=0.1147, over 22188.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01341, ecapa_loss=0.0004709, whisper_loss=0.1067, over 3892085.94 frames. ], batch size: 90, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:19:02,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=63000.0, ans=0.0 2024-08-09 15:19:20,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-09 15:19:56,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=63300.0, ans=0.125 2024-08-09 15:20:06,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=63400.0, ans=0.0 2024-08-09 15:20:12,709 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 15:20:16,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63400.0, ans=0.1 2024-08-09 15:20:18,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6350, loss[loss=0.1338, beats_loss=0.0149, ecapa_loss=0.0004028, whisper_loss=0.1149, over 20586.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01341, ecapa_loss=0.0004676, whisper_loss=0.1069, over 3898717.32 frames. ], batch size: 78, lr: 3.93e-02, grad_scale: 256.0 2024-08-09 15:20:30,139 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-09 15:20:32,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=11.21 vs. limit=10.0 2024-08-09 15:20:38,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=63600.0, ans=0.125 2024-08-09 15:20:40,908 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 15:21:00,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=63700.0, ans=0.125 2024-08-09 15:21:00,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2024-08-09 15:21:06,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=63800.0, ans=0.125 2024-08-09 15:21:07,329 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 15:21:10,543 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 15:21:12,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=63800.0, ans=0.0 2024-08-09 15:21:15,412 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 15:21:32,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=63900.0, ans=0.0 2024-08-09 15:21:38,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.075e+01 3.568e+01 4.201e+01 6.933e+01, threshold=7.136e+01, percent-clipped=1.0 2024-08-09 15:21:38,672 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6400, loss[loss=0.1491, beats_loss=0.01022, ecapa_loss=0.0005588, whisper_loss=0.1333, over 19614.00 frames. ], tot_loss[loss=0.1251, beats_loss=0.01336, ecapa_loss=0.0004666, whisper_loss=0.1071, over 3907667.87 frames. ], batch size: 79, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:21:44,951 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-09 15:21:45,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-08-09 15:21:52,212 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 15:21:58,407 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 15:22:07,836 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-09 15:22:28,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=64300.0, ans=15.0 2024-08-09 15:22:31,229 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 15:22:36,224 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 15:22:47,143 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 15:22:50,286 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 15:22:57,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6450, loss[loss=0.1128, beats_loss=0.01467, ecapa_loss=0.0004088, whisper_loss=0.09407, over 19053.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01337, ecapa_loss=0.0004663, whisper_loss=0.1074, over 3916791.63 frames. ], batch size: 74, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:23:21,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-09 15:23:23,694 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 15:23:29,271 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-09 15:23:32,364 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-09 15:23:43,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=64700.0, ans=0.2 2024-08-09 15:23:51,060 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 15:23:54,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=64800.0, ans=0.02 2024-08-09 15:23:54,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.62 vs. limit=15.0 2024-08-09 15:23:56,418 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-09 15:24:06,645 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-09 15:24:11,319 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 15:24:15,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2024-08-09 15:24:17,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.103e+01 3.527e+01 4.351e+01 8.335e+01, threshold=7.053e+01, percent-clipped=1.0 2024-08-09 15:24:17,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6500, loss[loss=0.1335, beats_loss=0.0115, ecapa_loss=0.0004115, whisper_loss=0.1179, over 18009.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01336, ecapa_loss=0.0004642, whisper_loss=0.1075, over 3969390.68 frames. ], batch size: 68, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:24:18,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=65000.0, ans=0.0 2024-08-09 15:25:11,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=65300.0, ans=0.2 2024-08-09 15:25:37,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-09 15:25:37,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6550, loss[loss=0.117, beats_loss=0.01418, ecapa_loss=0.0004346, whisper_loss=0.09843, over 22212.00 frames. ], tot_loss[loss=0.1251, beats_loss=0.01345, ecapa_loss=0.0004626, whisper_loss=0.107, over 3934196.51 frames. ], batch size: 87, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:25:49,582 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 15:25:59,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=65600.0, ans=0.125 2024-08-09 15:26:08,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=65600.0, ans=0.0 2024-08-09 15:26:12,605 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-09 15:26:12,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65700.0, ans=0.125 2024-08-09 15:26:18,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=65700.0, ans=0.2 2024-08-09 15:26:35,906 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 15:26:44,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=65900.0, ans=0.2 2024-08-09 15:26:53,090 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-09 15:26:57,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 3.063e+01 3.628e+01 4.391e+01 7.750e+01, threshold=7.256e+01, percent-clipped=3.0 2024-08-09 15:26:57,225 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6600, loss[loss=0.1091, beats_loss=0.01421, ecapa_loss=0.000433, whisper_loss=0.09051, over 22265.00 frames. ], tot_loss[loss=0.1246, beats_loss=0.01351, ecapa_loss=0.0004645, whisper_loss=0.1064, over 3946254.44 frames. ], batch size: 91, lr: 3.90e-02, grad_scale: 256.0 2024-08-09 15:27:02,438 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 15:27:05,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=66000.0, ans=0.0 2024-08-09 15:27:12,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=66100.0, ans=0.125 2024-08-09 15:27:12,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=66100.0, ans=0.125 2024-08-09 15:27:35,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=66200.0, ans=0.125 2024-08-09 15:27:43,333 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 15:28:14,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6650, loss[loss=0.1589, beats_loss=0.01182, ecapa_loss=0.0004889, whisper_loss=0.1422, over 23362.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01342, ecapa_loss=0.0004635, whisper_loss=0.1067, over 3929115.29 frames. ], batch size: 90, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:28:32,757 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 15:28:35,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=66600.0, ans=0.0 2024-08-09 15:28:50,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=66700.0, ans=0.125 2024-08-09 15:28:54,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=66700.0, ans=0.125 2024-08-09 15:28:56,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.17 vs. limit=15.0 2024-08-09 15:28:57,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=66700.0, ans=0.125 2024-08-09 15:29:31,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.012e+01 3.433e+01 4.224e+01 7.038e+01, threshold=6.866e+01, percent-clipped=0.0 2024-08-09 15:29:31,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6700, loss[loss=0.1209, beats_loss=0.01248, ecapa_loss=0.0005691, whisper_loss=0.1027, over 19384.00 frames. ], tot_loss[loss=0.1245, beats_loss=0.01342, ecapa_loss=0.0004638, whisper_loss=0.1064, over 3912296.85 frames. ], batch size: 80, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:29:43,267 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 15:29:45,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=15.0 2024-08-09 15:29:54,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=67100.0, ans=0.125 2024-08-09 15:29:59,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=67100.0, ans=0.125 2024-08-09 15:30:01,107 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 15:30:15,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=67200.0, ans=0.125 2024-08-09 15:30:15,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=67200.0, ans=0.0 2024-08-09 15:30:16,911 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 15:30:37,156 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.106e-01 2024-08-09 15:30:39,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=12.0 2024-08-09 15:30:47,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6750, loss[loss=0.1173, beats_loss=0.01302, ecapa_loss=0.0005085, whisper_loss=0.09916, over 22330.00 frames. ], tot_loss[loss=0.1251, beats_loss=0.01341, ecapa_loss=0.0004632, whisper_loss=0.1071, over 3910086.95 frames. ], batch size: 92, lr: 3.88e-02, grad_scale: 256.0 2024-08-09 15:30:50,130 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 15:31:12,673 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 15:31:48,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-09 15:31:51,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67900.0, ans=0.1 2024-08-09 15:31:54,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=67900.0, ans=0.2 2024-08-09 15:31:55,636 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 15:31:59,993 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:32:03,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.094e+01 3.540e+01 4.120e+01 7.157e+01, threshold=7.079e+01, percent-clipped=1.0 2024-08-09 15:32:03,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6800, loss[loss=0.1519, beats_loss=0.01226, ecapa_loss=0.000539, whisper_loss=0.1343, over 21545.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.01348, ecapa_loss=0.0004627, whisper_loss=0.1062, over 3886139.94 frames. ], batch size: 88, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:32:12,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=68000.0, ans=0.125 2024-08-09 15:32:16,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=68000.0, ans=10.0 2024-08-09 15:32:21,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=68100.0, ans=0.0 2024-08-09 15:32:25,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68100.0, ans=0.1 2024-08-09 15:32:37,653 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 15:32:42,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=68200.0, ans=0.0 2024-08-09 15:32:53,033 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 15:33:06,858 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 15:33:17,941 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6850, loss[loss=0.1288, beats_loss=0.01219, ecapa_loss=0.0004518, whisper_loss=0.1121, over 24051.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01355, ecapa_loss=0.0004608, whisper_loss=0.1055, over 3869907.24 frames. ], batch size: 93, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:33:20,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=68500.0, ans=0.125 2024-08-09 15:33:29,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=68500.0, ans=0.125 2024-08-09 15:33:29,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=68500.0, ans=0.125 2024-08-09 15:33:34,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=68600.0, ans=0.2 2024-08-09 15:33:41,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68600.0, ans=0.1 2024-08-09 15:34:01,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=68700.0, ans=0.125 2024-08-09 15:34:16,770 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 15:34:22,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2024-08-09 15:34:23,367 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 15:34:26,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=68900.0, ans=0.125 2024-08-09 15:34:27,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=68900.0, ans=0.0 2024-08-09 15:34:32,210 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 15:34:33,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.018e+01 3.583e+01 4.075e+01 7.184e+01, threshold=7.167e+01, percent-clipped=2.0 2024-08-09 15:34:33,109 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6900, loss[loss=0.1438, beats_loss=0.01253, ecapa_loss=0.0004472, whisper_loss=0.1268, over 22311.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.01352, ecapa_loss=0.0004558, whisper_loss=0.106, over 3853336.40 frames. ], batch size: 88, lr: 3.86e-02, grad_scale: 256.0 2024-08-09 15:34:45,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=12.0 2024-08-09 15:34:46,908 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 15:35:08,864 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 15:35:14,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=69200.0, ans=0.0 2024-08-09 15:35:14,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=69200.0, ans=0.0 2024-08-09 15:35:17,980 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 15:35:21,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=69300.0, ans=0.015 2024-08-09 15:35:44,655 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 15:35:49,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 6950, loss[loss=0.1311, beats_loss=0.01339, ecapa_loss=0.0004606, whisper_loss=0.1131, over 22793.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01356, ecapa_loss=0.0004527, whisper_loss=0.1055, over 3839081.73 frames. ], batch size: 93, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:35:54,594 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 15:36:09,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=69600.0, ans=0.125 2024-08-09 15:36:13,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2024-08-09 15:36:35,570 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 15:36:39,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=69800.0, ans=0.125 2024-08-09 15:37:07,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.090e+01 3.523e+01 4.430e+01 8.295e+01, threshold=7.046e+01, percent-clipped=3.0 2024-08-09 15:37:07,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7000, loss[loss=0.1132, beats_loss=0.01047, ecapa_loss=0.0004676, whisper_loss=0.09807, over 16814.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01346, ecapa_loss=0.0004562, whisper_loss=0.1047, over 3833656.86 frames. ], batch size: 64, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:37:13,341 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-09 15:37:16,497 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 34 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 15:37:26,238 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 15:37:44,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=70200.0, ans=0.125 2024-08-09 15:37:59,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=70300.0, ans=0.0 2024-08-09 15:38:00,832 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 15:38:13,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.38 vs. limit=15.0 2024-08-09 15:38:26,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=70400.0, ans=0.125 2024-08-09 15:38:29,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7050, loss[loss=0.125, beats_loss=0.01079, ecapa_loss=0.0004355, whisper_loss=0.1099, over 16771.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01344, ecapa_loss=0.000456, whisper_loss=0.1051, over 3863235.15 frames. ], batch size: 63, lr: 3.84e-02, grad_scale: 256.0 2024-08-09 15:38:50,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=70600.0, ans=0.125 2024-08-09 15:38:52,437 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-09 15:38:58,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=70700.0, ans=0.0 2024-08-09 15:39:00,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2024-08-09 15:39:09,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=70700.0, ans=0.0 2024-08-09 15:39:18,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=70800.0, ans=0.0 2024-08-09 15:39:42,613 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 15:39:43,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.931e+01 3.439e+01 4.149e+01 6.385e+01, threshold=6.878e+01, percent-clipped=0.0 2024-08-09 15:39:43,695 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7100, loss[loss=0.116, beats_loss=0.01381, ecapa_loss=0.0004078, whisper_loss=0.09808, over 22664.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01344, ecapa_loss=0.000454, whisper_loss=0.1046, over 3854248.81 frames. ], batch size: 90, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:39:45,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=71000.0, ans=0.07 2024-08-09 15:39:46,423 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 15:39:53,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=71000.0, ans=0.125 2024-08-09 15:40:11,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=71100.0, ans=0.95 2024-08-09 15:40:15,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71200.0, ans=0.1 2024-08-09 15:40:23,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=71200.0, ans=0.125 2024-08-09 15:40:31,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=71300.0, ans=0.125 2024-08-09 15:40:46,933 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-09 15:41:00,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=71500.0, ans=0.1 2024-08-09 15:41:00,829 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7150, loss[loss=0.1223, beats_loss=0.01314, ecapa_loss=0.0004888, whisper_loss=0.1043, over 22071.00 frames. ], tot_loss[loss=0.1228, beats_loss=0.01336, ecapa_loss=0.0004543, whisper_loss=0.1049, over 3865137.03 frames. ], batch size: 93, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:41:06,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=15.0 2024-08-09 15:41:10,630 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 34 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 15:41:59,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71800.0, ans=0.125 2024-08-09 15:42:16,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=71900.0, ans=0.05 2024-08-09 15:42:21,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.087e+01 3.536e+01 4.239e+01 7.384e+01, threshold=7.073e+01, percent-clipped=1.0 2024-08-09 15:42:21,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7200, loss[loss=0.1209, beats_loss=0.01356, ecapa_loss=0.0003926, whisper_loss=0.1034, over 20742.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01328, ecapa_loss=0.0004534, whisper_loss=0.1061, over 3884867.58 frames. ], batch size: 83, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:42:25,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72000.0, ans=0.1 2024-08-09 15:42:35,067 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 15:42:50,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=72100.0, ans=0.125 2024-08-09 15:42:55,549 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-09 15:43:33,220 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:43:35,362 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 14 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 15:43:45,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=72500.0, ans=0.04949747468305833 2024-08-09 15:43:46,368 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7250, loss[loss=0.1308, beats_loss=0.0128, ecapa_loss=0.0004535, whisper_loss=0.1135, over 22378.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01333, ecapa_loss=0.0004513, whisper_loss=0.1059, over 3884252.05 frames. ], batch size: 90, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:43:54,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=72500.0, ans=0.0 2024-08-09 15:43:59,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=72500.0, ans=0.04949747468305833 2024-08-09 15:44:09,614 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-09 15:44:11,367 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 15:44:29,681 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:44:35,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=72700.0, ans=0.125 2024-08-09 15:44:36,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.83 vs. limit=10.0 2024-08-09 15:44:41,149 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 15:45:05,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=72900.0, ans=0.125 2024-08-09 15:45:08,582 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:45:08,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=72900.0, ans=0.125 2024-08-09 15:45:17,035 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 15:45:20,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.182e+01 3.061e+01 3.709e+01 4.320e+01 7.317e+01, threshold=7.418e+01, percent-clipped=1.0 2024-08-09 15:45:20,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7300, loss[loss=0.1369, beats_loss=0.01248, ecapa_loss=0.0004769, whisper_loss=0.1197, over 23098.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.01332, ecapa_loss=0.0004515, whisper_loss=0.1065, over 3898040.98 frames. ], batch size: 91, lr: 3.81e-02, grad_scale: 256.0 2024-08-09 15:45:21,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-09 15:45:40,554 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 15:45:41,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=73100.0, ans=15.0 2024-08-09 15:45:59,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=73200.0, ans=0.125 2024-08-09 15:46:00,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-09 15:46:09,651 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 40 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-09 15:46:13,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=73200.0, ans=0.125 2024-08-09 15:46:15,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=73200.0, ans=0.0 2024-08-09 15:46:38,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=73400.0, ans=0.0 2024-08-09 15:46:44,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-09 15:46:52,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7350, loss[loss=0.1222, beats_loss=0.01182, ecapa_loss=0.0005344, whisper_loss=0.1051, over 16185.00 frames. ], tot_loss[loss=0.1245, beats_loss=0.01329, ecapa_loss=0.0004516, whisper_loss=0.1067, over 3890173.56 frames. ], batch size: 64, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:46:53,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=73500.0, ans=0.125 2024-08-09 15:46:55,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-09 15:47:06,090 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 15:47:08,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=73600.0, ans=0.125 2024-08-09 15:47:13,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.06 vs. limit=15.0 2024-08-09 15:47:15,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=73600.0, ans=0.125 2024-08-09 15:47:15,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=73600.0, ans=0.125 2024-08-09 15:47:42,350 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 15:47:58,175 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 15:48:01,989 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 15:48:13,217 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 15:48:20,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2024-08-09 15:48:21,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.843e+01 3.393e+01 4.039e+01 7.371e+01, threshold=6.786e+01, percent-clipped=0.0 2024-08-09 15:48:21,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7400, loss[loss=0.1051, beats_loss=0.01214, ecapa_loss=0.0004874, whisper_loss=0.08806, over 17946.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.01329, ecapa_loss=0.0004508, whisper_loss=0.1065, over 3893992.64 frames. ], batch size: 73, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:48:38,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2024-08-09 15:48:46,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=74100.0, ans=0.125 2024-08-09 15:48:46,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2024-08-09 15:49:02,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=74200.0, ans=0.125 2024-08-09 15:49:06,189 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 15:49:16,743 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:49:20,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-09 15:49:36,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=74400.0, ans=0.125 2024-08-09 15:49:51,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=74400.0, ans=0.0 2024-08-09 15:49:55,618 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 38 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-09 15:49:55,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=74500.0, ans=0.2 2024-08-09 15:49:56,535 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7450, loss[loss=0.1529, beats_loss=0.01085, ecapa_loss=0.0003713, whisper_loss=0.1383, over 22316.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01326, ecapa_loss=0.0004508, whisper_loss=0.1066, over 3905800.94 frames. ], batch size: 83, lr: 3.79e-02, grad_scale: 256.0 2024-08-09 15:50:07,144 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-09 15:50:08,859 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-09 15:50:18,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=74600.0, ans=0.125 2024-08-09 15:50:26,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-08-09 15:50:42,407 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 15:50:46,337 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 15:51:13,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 3.130e+01 3.399e+01 4.155e+01 7.076e+01, threshold=6.798e+01, percent-clipped=1.0 2024-08-09 15:51:13,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7500, loss[loss=0.143, beats_loss=0.01115, ecapa_loss=0.0004955, whisper_loss=0.1269, over 22320.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01331, ecapa_loss=0.0004518, whisper_loss=0.1057, over 3896859.74 frames. ], batch size: 90, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:51:15,829 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-09 15:51:33,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=75100.0, ans=0.2 2024-08-09 15:52:01,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=75300.0, ans=0.125 2024-08-09 15:52:05,654 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 15:52:06,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2024-08-09 15:52:11,704 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-09 15:52:17,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=75400.0, ans=0.125 2024-08-09 15:52:24,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7550, loss[loss=0.1536, beats_loss=0.009955, ecapa_loss=0.0004656, whisper_loss=0.139, over 15476.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01323, ecapa_loss=0.0004512, whisper_loss=0.106, over 3856360.25 frames. ], batch size: 59, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:52:39,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=12.0 2024-08-09 15:52:54,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-09 15:53:01,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=75700.0, ans=0.125 2024-08-09 15:53:06,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=75800.0, ans=0.125 2024-08-09 15:53:10,771 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-09 15:53:13,741 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-09 15:53:20,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-08-09 15:53:31,556 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 15:53:35,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 3.036e+01 3.542e+01 4.226e+01 5.898e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 15:53:35,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7600, loss[loss=0.1245, beats_loss=0.01229, ecapa_loss=0.000526, whisper_loss=0.1069, over 21729.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.01315, ecapa_loss=0.0004519, whisper_loss=0.1064, over 3863887.78 frames. ], batch size: 94, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:53:37,048 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 15:53:42,701 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 13 from LS+wenet, 24 from Vox, 52 fro AS 2024-08-09 15:53:48,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=76100.0, ans=0.1 2024-08-09 15:53:49,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=76100.0, ans=0.125 2024-08-09 15:53:55,464 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 15:54:08,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-08-09 15:54:14,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2024-08-09 15:54:18,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=76300.0, ans=0.0 2024-08-09 15:54:41,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=76400.0, ans=0.125 2024-08-09 15:54:44,980 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 15:54:46,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7650, loss[loss=0.1247, beats_loss=0.01349, ecapa_loss=0.0005069, whisper_loss=0.1061, over 17392.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01329, ecapa_loss=0.0004473, whisper_loss=0.1055, over 3876411.63 frames. ], batch size: 70, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:55:11,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=76600.0, ans=0.125 2024-08-09 15:55:17,259 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 15:55:25,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-09 15:55:46,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=76900.0, ans=0.125 2024-08-09 15:55:51,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=76900.0, ans=0.125 2024-08-09 15:55:55,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.321e+01 3.065e+01 3.556e+01 4.140e+01 7.466e+01, threshold=7.113e+01, percent-clipped=1.0 2024-08-09 15:55:55,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7700, loss[loss=0.1141, beats_loss=0.01497, ecapa_loss=0.0004321, whisper_loss=0.0948, over 22284.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.01324, ecapa_loss=0.0004458, whisper_loss=0.1064, over 3891334.01 frames. ], batch size: 92, lr: 3.76e-02, grad_scale: 256.0 2024-08-09 15:56:11,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=77100.0, ans=0.95 2024-08-09 15:56:13,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.59 vs. limit=22.5 2024-08-09 15:56:15,747 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 15:56:19,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.81 vs. limit=22.5 2024-08-09 15:56:35,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=77200.0, ans=0.125 2024-08-09 15:56:40,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2024-08-09 15:56:44,574 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 15:56:45,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=77300.0, ans=22.5 2024-08-09 15:56:46,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=77300.0, ans=0.1 2024-08-09 15:56:56,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=77400.0, ans=0.125 2024-08-09 15:57:07,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7750, loss[loss=0.1532, beats_loss=0.009831, ecapa_loss=0.0005058, whisper_loss=0.1383, over 21330.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01321, ecapa_loss=0.0004454, whisper_loss=0.1059, over 3894214.94 frames. ], batch size: 84, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:57:07,809 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 15:57:37,373 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 15:57:51,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=77800.0, ans=0.125 2024-08-09 15:57:56,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=8.0 2024-08-09 15:57:56,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=77800.0, ans=0.125 2024-08-09 15:58:02,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=77900.0, ans=0.125 2024-08-09 15:58:02,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.83 vs. limit=22.5 2024-08-09 15:58:09,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=77900.0, ans=0.0 2024-08-09 15:58:13,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=77900.0, ans=0.2 2024-08-09 15:58:17,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.915e+01 3.303e+01 4.126e+01 7.711e+01, threshold=6.607e+01, percent-clipped=1.0 2024-08-09 15:58:17,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7800, loss[loss=0.1045, beats_loss=0.01859, ecapa_loss=0.0003619, whisper_loss=0.08233, over 19666.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01334, ecapa_loss=0.0004443, whisper_loss=0.1057, over 3910314.52 frames. ], batch size: 79, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:58:22,932 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 35 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 15:58:28,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=78000.0, ans=0.125 2024-08-09 15:58:39,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=78100.0, ans=0.125 2024-08-09 15:58:52,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=78200.0, ans=0.0 2024-08-09 15:58:54,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=78200.0, ans=0.125 2024-08-09 15:59:01,295 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-09 15:59:05,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2024-08-09 15:59:15,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.13 vs. limit=10.0 2024-08-09 15:59:25,455 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 15:59:25,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=78500.0, ans=0.0 2024-08-09 15:59:26,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7850, loss[loss=0.1416, beats_loss=0.0103, ecapa_loss=0.0004915, whisper_loss=0.1264, over 23323.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01338, ecapa_loss=0.0004422, whisper_loss=0.1058, over 3896085.65 frames. ], batch size: 90, lr: 3.74e-02, grad_scale: 256.0 2024-08-09 15:59:36,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=78500.0, ans=0.125 2024-08-09 15:59:36,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=78500.0, ans=0.125 2024-08-09 15:59:37,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-08-09 15:59:56,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=78700.0, ans=0.1 2024-08-09 16:00:02,553 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-09 16:00:27,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=78900.0, ans=0.0 2024-08-09 16:00:35,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 3.036e+01 3.521e+01 4.450e+01 7.582e+01, threshold=7.043e+01, percent-clipped=4.0 2024-08-09 16:00:35,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7900, loss[loss=0.1077, beats_loss=0.01284, ecapa_loss=0.0004355, whisper_loss=0.09047, over 15417.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.0135, ecapa_loss=0.0004415, whisper_loss=0.1054, over 3899647.07 frames. ], batch size: 58, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:00:44,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2024-08-09 16:00:52,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.14 vs. limit=10.0 2024-08-09 16:00:58,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=79100.0, ans=0.125 2024-08-09 16:01:05,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-09 16:01:09,839 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 16:01:11,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=79200.0, ans=0.2 2024-08-09 16:01:27,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=79300.0, ans=0.125 2024-08-09 16:01:36,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=79400.0, ans=0.0 2024-08-09 16:01:43,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 7950, loss[loss=0.1216, beats_loss=0.01641, ecapa_loss=0.0004215, whisper_loss=0.101, over 21884.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01336, ecapa_loss=0.0004418, whisper_loss=0.1059, over 3895768.82 frames. ], batch size: 91, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:01:48,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=79500.0, ans=0.125 2024-08-09 16:01:51,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=79500.0, ans=0.0 2024-08-09 16:01:55,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79500.0, ans=0.1 2024-08-09 16:02:23,553 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-09 16:02:29,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-09 16:02:42,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=79900.0, ans=0.125 2024-08-09 16:02:54,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 3.066e+01 3.561e+01 4.217e+01 9.530e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 16:02:54,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8000, loss[loss=0.112, beats_loss=0.015, ecapa_loss=0.0004046, whisper_loss=0.09291, over 22361.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01333, ecapa_loss=0.0004405, whisper_loss=0.1057, over 3906245.98 frames. ], batch size: 92, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:02:54,719 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 16:03:01,204 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 16:03:04,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=80000.0, ans=0.09899494936611666 2024-08-09 16:03:06,941 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-09 16:03:11,149 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-09 16:03:18,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=80100.0, ans=0.025 2024-08-09 16:03:30,337 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.282e-01 2024-08-09 16:03:41,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=80300.0, ans=0.125 2024-08-09 16:04:00,493 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-09 16:04:01,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8050, loss[loss=0.1237, beats_loss=0.01575, ecapa_loss=0.0003614, whisper_loss=0.1043, over 20932.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01327, ecapa_loss=0.0004402, whisper_loss=0.1062, over 3928520.17 frames. ], batch size: 80, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:04:06,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80500.0, ans=0.1 2024-08-09 16:04:21,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=80600.0, ans=0.0 2024-08-09 16:04:39,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80700.0, ans=0.1 2024-08-09 16:04:49,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=80800.0, ans=0.125 2024-08-09 16:05:01,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=80900.0, ans=0.5 2024-08-09 16:05:03,033 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.130e+00 2024-08-09 16:05:08,457 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:05:10,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 3.011e+01 3.515e+01 4.189e+01 8.391e+01, threshold=7.029e+01, percent-clipped=0.0 2024-08-09 16:05:10,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8100, loss[loss=0.15, beats_loss=0.00995, ecapa_loss=0.0004957, whisper_loss=0.1351, over 22387.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01327, ecapa_loss=0.0004409, whisper_loss=0.1059, over 3929060.88 frames. ], batch size: 88, lr: 3.71e-02, grad_scale: 512.0 2024-08-09 16:05:15,029 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 16:05:21,653 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 17 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-09 16:05:26,698 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 16:05:26,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=81100.0, ans=10.0 2024-08-09 16:05:32,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2024-08-09 16:05:37,730 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-09 16:05:42,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81200.0, ans=0.1 2024-08-09 16:05:44,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=81200.0, ans=0.125 2024-08-09 16:05:51,302 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 16:05:56,901 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 35 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 16:06:06,581 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 16:06:09,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81400.0, ans=0.125 2024-08-09 16:06:19,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8150, loss[loss=0.1374, beats_loss=0.01342, ecapa_loss=0.0003221, whisper_loss=0.1207, over 17442.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01325, ecapa_loss=0.0004393, whisper_loss=0.1057, over 3884787.94 frames. ], batch size: 64, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:06:29,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=81500.0, ans=0.125 2024-08-09 16:06:33,236 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 16:06:38,935 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 16:06:42,969 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 16:07:03,978 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 19 from Vox, 53 fro AS 2024-08-09 16:07:11,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-08-09 16:07:12,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2024-08-09 16:07:23,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=81900.0, ans=0.2 2024-08-09 16:07:27,522 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.111e+01 3.553e+01 4.149e+01 8.297e+01, threshold=7.106e+01, percent-clipped=2.0 2024-08-09 16:07:27,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8200, loss[loss=0.06507, beats_loss=0.01686, ecapa_loss=0.0003854, whisper_loss=0.04435, over 15186.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01332, ecapa_loss=0.0004382, whisper_loss=0.1046, over 3873641.52 frames. ], batch size: 62, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:07:29,345 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 16:07:38,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2024-08-09 16:07:38,575 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 16:07:43,843 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-09 16:07:45,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=82100.0, ans=10.0 2024-08-09 16:07:56,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=82200.0, ans=0.125 2024-08-09 16:08:11,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=82300.0, ans=0.0 2024-08-09 16:08:13,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=82300.0, ans=0.05 2024-08-09 16:08:16,974 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 16:08:18,321 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 28 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-09 16:08:36,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8250, loss[loss=0.1158, beats_loss=0.01411, ecapa_loss=0.0003933, whisper_loss=0.09779, over 18618.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01343, ecapa_loss=0.0004349, whisper_loss=0.1039, over 3886545.91 frames. ], batch size: 74, lr: 3.69e-02, grad_scale: 512.0 2024-08-09 16:08:50,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=82600.0, ans=0.125 2024-08-09 16:09:05,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=82700.0, ans=0.125 2024-08-09 16:09:19,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82800.0, ans=0.1 2024-08-09 16:09:29,171 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.918e-01 2024-08-09 16:09:31,236 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 16:09:37,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=82900.0, ans=0.0 2024-08-09 16:09:44,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.998e+01 3.523e+01 3.969e+01 6.917e+01, threshold=7.045e+01, percent-clipped=0.0 2024-08-09 16:09:44,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8300, loss[loss=0.1344, beats_loss=0.01185, ecapa_loss=0.0004425, whisper_loss=0.1181, over 20146.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01338, ecapa_loss=0.0004314, whisper_loss=0.1044, over 3885705.14 frames. ], batch size: 83, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:10:01,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83100.0, ans=0.1 2024-08-09 16:10:08,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.11 vs. limit=22.5 2024-08-09 16:10:16,607 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 16:10:29,819 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 16:10:42,091 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 16:10:57,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8350, loss[loss=0.1088, beats_loss=0.01419, ecapa_loss=0.0004578, whisper_loss=0.09001, over 16245.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01337, ecapa_loss=0.0004311, whisper_loss=0.1046, over 3884982.77 frames. ], batch size: 66, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:11:09,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=83500.0, ans=0.0 2024-08-09 16:11:11,676 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 16:11:24,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=83700.0, ans=0.1 2024-08-09 16:11:42,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=83800.0, ans=0.125 2024-08-09 16:11:49,704 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 16:11:54,683 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 16:12:08,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.077e+01 3.401e+01 4.133e+01 6.317e+01, threshold=6.802e+01, percent-clipped=0.0 2024-08-09 16:12:08,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8400, loss[loss=0.1147, beats_loss=0.01514, ecapa_loss=0.0003605, whisper_loss=0.09591, over 20109.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.0134, ecapa_loss=0.0004296, whisper_loss=0.1049, over 3911931.67 frames. ], batch size: 77, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:12:24,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=84100.0, ans=0.0 2024-08-09 16:12:26,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=84100.0, ans=0.0 2024-08-09 16:12:32,288 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 16:12:40,856 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-09 16:12:48,410 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 16:12:55,083 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 16:13:02,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=84300.0, ans=0.125 2024-08-09 16:13:20,721 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-09 16:13:30,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8450, loss[loss=0.1282, beats_loss=0.01343, ecapa_loss=0.0003508, whisper_loss=0.1113, over 22678.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01326, ecapa_loss=0.0004329, whisper_loss=0.1059, over 3909024.79 frames. ], batch size: 88, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:13:33,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=84500.0, ans=15.0 2024-08-09 16:13:59,166 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 16:14:08,280 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 16:14:37,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=84900.0, ans=0.125 2024-08-09 16:14:40,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=84900.0, ans=0.05 2024-08-09 16:14:42,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=84900.0, ans=0.2 2024-08-09 16:14:42,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=84900.0, ans=0.125 2024-08-09 16:14:47,852 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 16:14:51,039 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.954e+01 3.407e+01 4.304e+01 7.894e+01, threshold=6.814e+01, percent-clipped=2.0 2024-08-09 16:14:51,059 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8500, loss[loss=0.1203, beats_loss=0.01266, ecapa_loss=0.000391, whisper_loss=0.1037, over 17508.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01328, ecapa_loss=0.0004299, whisper_loss=0.1057, over 3879909.53 frames. ], batch size: 71, lr: 3.66e-02, grad_scale: 512.0 2024-08-09 16:15:07,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-08-09 16:15:22,924 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 16:15:29,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=85200.0, ans=0.0 2024-08-09 16:15:35,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-09 16:15:45,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=85200.0, ans=0.07 2024-08-09 16:16:01,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=85300.0, ans=0.125 2024-08-09 16:16:04,691 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-09 16:16:12,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=85400.0, ans=0.125 2024-08-09 16:16:14,939 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 16:16:22,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-08-09 16:16:22,970 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 16:16:26,999 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8550, loss[loss=0.1159, beats_loss=0.01387, ecapa_loss=0.0005159, whisper_loss=0.09684, over 21633.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01321, ecapa_loss=0.0004316, whisper_loss=0.106, over 3853760.17 frames. ], batch size: 91, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:16:35,765 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 16:17:25,442 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 16:17:31,685 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 16:17:45,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.78 vs. limit=22.5 2024-08-09 16:17:45,900 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-09 16:17:48,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-08-09 16:18:03,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.923e+01 3.374e+01 4.145e+01 6.398e+01, threshold=6.748e+01, percent-clipped=0.0 2024-08-09 16:18:03,703 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8600, loss[loss=0.1307, beats_loss=0.01391, ecapa_loss=0.0003727, whisper_loss=0.113, over 21427.00 frames. ], tot_loss[loss=0.1242, beats_loss=0.01319, ecapa_loss=0.0004309, whisper_loss=0.1067, over 3847716.16 frames. ], batch size: 83, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:18:14,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=15.0 2024-08-09 16:18:19,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=86000.0, ans=0.2 2024-08-09 16:18:26,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=86100.0, ans=0.0 2024-08-09 16:18:32,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.97 vs. limit=15.0 2024-08-09 16:18:54,856 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 16:19:07,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2024-08-09 16:19:15,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=86300.0, ans=0.125 2024-08-09 16:19:21,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-09 16:19:35,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2024-08-09 16:19:35,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=15.0 2024-08-09 16:19:40,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8650, loss[loss=0.126, beats_loss=0.01665, ecapa_loss=0.0004733, whisper_loss=0.1046, over 16497.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01326, ecapa_loss=0.0004296, whisper_loss=0.106, over 3833760.34 frames. ], batch size: 70, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:19:49,993 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 16:19:51,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=86500.0, ans=0.0 2024-08-09 16:19:52,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=86500.0, ans=0.125 2024-08-09 16:20:11,750 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 16:20:16,166 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-09 16:20:17,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=86700.0, ans=0.0 2024-08-09 16:20:20,743 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-09 16:20:28,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2024-08-09 16:20:35,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=86800.0, ans=0.015 2024-08-09 16:20:53,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=86900.0, ans=0.0 2024-08-09 16:20:58,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=87000.0, ans=0.2 2024-08-09 16:20:59,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.913e+01 3.504e+01 4.209e+01 7.626e+01, threshold=7.009e+01, percent-clipped=5.0 2024-08-09 16:20:59,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8700, loss[loss=0.1161, beats_loss=0.01339, ecapa_loss=0.0004335, whisper_loss=0.09834, over 23020.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01337, ecapa_loss=0.0004297, whisper_loss=0.105, over 3833828.28 frames. ], batch size: 93, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:21:07,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=87000.0, ans=0.0 2024-08-09 16:21:23,844 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 16:21:31,003 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 16:21:38,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-09 16:21:54,642 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 16:21:56,029 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 39 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 16:22:00,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=87300.0, ans=0.125 2024-08-09 16:22:01,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87300.0, ans=0.1 2024-08-09 16:22:02,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87300.0, ans=0.1 2024-08-09 16:22:06,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87300.0, ans=0.1 2024-08-09 16:22:14,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-08-09 16:22:16,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.91 vs. limit=22.5 2024-08-09 16:22:26,403 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 16:22:28,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8750, loss[loss=0.1162, beats_loss=0.0136, ecapa_loss=0.0004333, whisper_loss=0.09826, over 19946.00 frames. ], tot_loss[loss=0.1224, beats_loss=0.0134, ecapa_loss=0.0004271, whisper_loss=0.1047, over 3852999.97 frames. ], batch size: 79, lr: 3.63e-02, grad_scale: 512.0 2024-08-09 16:22:38,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=87500.0, ans=0.125 2024-08-09 16:22:43,888 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 16:23:00,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=87700.0, ans=0.125 2024-08-09 16:23:00,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=87700.0, ans=0.0 2024-08-09 16:23:26,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=87900.0, ans=0.125 2024-08-09 16:23:27,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=15.0 2024-08-09 16:23:34,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=87900.0, ans=0.125 2024-08-09 16:23:39,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.881e+01 3.394e+01 4.029e+01 7.137e+01, threshold=6.788e+01, percent-clipped=1.0 2024-08-09 16:23:39,401 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8800, loss[loss=0.08702, beats_loss=0.0177, ecapa_loss=0.0003201, whisper_loss=0.06612, over 19838.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01345, ecapa_loss=0.0004294, whisper_loss=0.104, over 3876605.43 frames. ], batch size: 81, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:23:44,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=88000.0, ans=0.0 2024-08-09 16:23:45,801 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-09 16:23:58,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=88100.0, ans=0.95 2024-08-09 16:24:00,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2024-08-09 16:24:14,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=88200.0, ans=0.125 2024-08-09 16:24:21,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2024-08-09 16:24:26,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=88300.0, ans=0.0 2024-08-09 16:24:34,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=88300.0, ans=0.125 2024-08-09 16:24:51,516 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8850, loss[loss=0.1201, beats_loss=0.01329, ecapa_loss=0.0004176, whisper_loss=0.1026, over 14299.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01342, ecapa_loss=0.0004253, whisper_loss=0.1047, over 3882394.16 frames. ], batch size: 54, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:24:51,768 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 16:24:53,139 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 16:24:54,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=88500.0, ans=0.2 2024-08-09 16:25:01,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=88500.0, ans=10.0 2024-08-09 16:25:15,686 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 16:25:31,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=88700.0, ans=0.0 2024-08-09 16:25:34,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.64 vs. limit=15.0 2024-08-09 16:25:36,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=88800.0, ans=0.125 2024-08-09 16:25:48,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2024-08-09 16:26:01,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.897e+01 3.367e+01 4.055e+01 6.951e+01, threshold=6.734e+01, percent-clipped=1.0 2024-08-09 16:26:01,953 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8900, loss[loss=0.1296, beats_loss=0.01215, ecapa_loss=0.0004085, whisper_loss=0.1134, over 18381.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01335, ecapa_loss=0.000425, whisper_loss=0.1046, over 3874780.97 frames. ], batch size: 71, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:26:05,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=89000.0, ans=0.125 2024-08-09 16:26:34,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=89200.0, ans=0.0 2024-08-09 16:26:47,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=89300.0, ans=0.0 2024-08-09 16:27:10,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 8950, loss[loss=0.1006, beats_loss=0.01103, ecapa_loss=0.0005167, whisper_loss=0.08443, over 18829.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01337, ecapa_loss=0.000422, whisper_loss=0.1047, over 3873975.31 frames. ], batch size: 77, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:27:23,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=89600.0, ans=0.0 2024-08-09 16:27:24,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.00 vs. limit=10.0 2024-08-09 16:27:43,973 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 16:28:14,797 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 16:28:19,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=90000.0, ans=0.2 2024-08-09 16:28:20,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.962e+01 3.391e+01 3.948e+01 7.468e+01, threshold=6.781e+01, percent-clipped=1.0 2024-08-09 16:28:20,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9000, loss[loss=0.09306, beats_loss=0.01438, ecapa_loss=0.0004621, whisper_loss=0.07406, over 18363.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01335, ecapa_loss=0.0004223, whisper_loss=0.1044, over 3878344.76 frames. ], batch size: 78, lr: 3.60e-02, grad_scale: 512.0 2024-08-09 16:28:20,179 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 16:29:00,144 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on ASR_libri: loss=0.2932, beats_loss=0, ecapa_loss=0.001188, whisper_loss=0.2813, over 922467.00 frames. 2024-08-09 16:29:16,757 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on SV_voxceleb1: loss=0.01105, beats_loss=0, ecapa_loss=0.001105, whisper_loss=0, over 939242.00 frames. 2024-08-09 16:31:05,593 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.3442, 2.6007, 2.4620, 2.1593], device='cuda:3') 2024-08-09 16:31:15,764 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on AT_audioset: loss=0.03209, beats_loss=0.03209, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 16:31:15,768 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 16:31:22,774 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-09 16:31:25,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=90000.0, ans=0.125 2024-08-09 16:31:34,142 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 16:31:46,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90200.0, ans=0.1 2024-08-09 16:31:47,661 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 16:31:52,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=90200.0, ans=0.125 2024-08-09 16:32:02,759 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 16:32:03,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=90300.0, ans=0.0 2024-08-09 16:32:11,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=90400.0, ans=0.125 2024-08-09 16:32:12,310 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 16:32:16,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=90400.0, ans=0.2 2024-08-09 16:32:22,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=90400.0, ans=0.125 2024-08-09 16:32:24,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9050, loss[loss=0.1271, beats_loss=0.01243, ecapa_loss=0.0004008, whisper_loss=0.1107, over 21835.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.0133, ecapa_loss=0.0004228, whisper_loss=0.1047, over 3904842.11 frames. ], batch size: 88, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:32:38,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=90600.0, ans=0.1 2024-08-09 16:32:50,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2024-08-09 16:32:53,974 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 25 from Vox, 16 fro AS 2024-08-09 16:33:04,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90800.0, ans=0.125 2024-08-09 16:33:20,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=90900.0, ans=0.125 2024-08-09 16:33:32,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.994e+01 3.542e+01 4.086e+01 6.210e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 16:33:32,763 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9100, loss[loss=0.1174, beats_loss=0.01183, ecapa_loss=0.0003689, whisper_loss=0.1019, over 15743.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01325, ecapa_loss=0.0004255, whisper_loss=0.1048, over 3901178.39 frames. ], batch size: 59, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:33:34,413 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-09 16:33:48,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=12.0 2024-08-09 16:33:55,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=91100.0, ans=0.125 2024-08-09 16:34:14,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=91300.0, ans=0.125 2024-08-09 16:34:26,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=91400.0, ans=0.0 2024-08-09 16:34:31,692 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-09 16:34:34,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=91400.0, ans=0.125 2024-08-09 16:34:41,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9150, loss[loss=0.1123, beats_loss=0.01433, ecapa_loss=0.0004094, whisper_loss=0.09387, over 16236.00 frames. ], tot_loss[loss=0.1224, beats_loss=0.01322, ecapa_loss=0.0004241, whisper_loss=0.105, over 3895903.15 frames. ], batch size: 65, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:35:05,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=91600.0, ans=0.125 2024-08-09 16:35:08,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91700.0, ans=0.125 2024-08-09 16:35:08,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2024-08-09 16:35:14,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=91700.0, ans=0.125 2024-08-09 16:35:15,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=91700.0, ans=0.2 2024-08-09 16:35:26,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=91800.0, ans=0.0 2024-08-09 16:35:27,728 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 16:35:29,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=91800.0, ans=0.125 2024-08-09 16:35:30,461 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-09 16:35:34,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=91900.0, ans=0.125 2024-08-09 16:35:49,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.827e+01 3.202e+01 3.925e+01 7.636e+01, threshold=6.404e+01, percent-clipped=0.0 2024-08-09 16:35:49,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9200, loss[loss=0.125, beats_loss=0.01142, ecapa_loss=0.0004144, whisper_loss=0.1094, over 22773.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01323, ecapa_loss=0.000423, whisper_loss=0.1051, over 3908798.75 frames. ], batch size: 91, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:35:50,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92000.0, ans=0.125 2024-08-09 16:36:02,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=12.0 2024-08-09 16:36:04,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.74 vs. limit=22.5 2024-08-09 16:36:08,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=92100.0, ans=0.0 2024-08-09 16:36:11,910 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 16:36:22,241 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.406e+00 2024-08-09 16:36:24,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2024-08-09 16:36:26,166 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 16:36:37,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92300.0, ans=0.1 2024-08-09 16:36:42,555 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 16:36:58,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9250, loss[loss=0.09699, beats_loss=0.01543, ecapa_loss=0.0003618, whisper_loss=0.07794, over 21000.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01324, ecapa_loss=0.000423, whisper_loss=0.1046, over 3903344.21 frames. ], batch size: 83, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:37:13,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=15.0 2024-08-09 16:37:14,032 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 16:37:18,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=92600.0, ans=0.07 2024-08-09 16:37:25,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92700.0, ans=0.1 2024-08-09 16:37:29,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=92700.0, ans=0.0 2024-08-09 16:37:38,650 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 16:37:49,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92800.0, ans=0.1 2024-08-09 16:37:50,344 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-09 16:38:05,722 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 16:38:07,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.067e+01 3.450e+01 4.093e+01 6.352e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 16:38:07,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9300, loss[loss=0.09411, beats_loss=0.01373, ecapa_loss=0.0003942, whisper_loss=0.07643, over 17076.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01322, ecapa_loss=0.0004215, whisper_loss=0.1048, over 3941180.65 frames. ], batch size: 68, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:38:11,555 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-09 16:38:15,563 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-09 16:38:17,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=93000.0, ans=0.125 2024-08-09 16:38:18,306 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-09 16:38:20,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-09 16:38:37,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=93200.0, ans=0.125 2024-08-09 16:38:46,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.61 vs. limit=10.0 2024-08-09 16:38:53,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-09 16:39:12,304 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 16:39:15,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9350, loss[loss=0.1086, beats_loss=0.01454, ecapa_loss=0.0003697, whisper_loss=0.09032, over 14696.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.0132, ecapa_loss=0.0004224, whisper_loss=0.1051, over 3932338.37 frames. ], batch size: 57, lr: 3.56e-02, grad_scale: 512.0 2024-08-09 16:39:26,024 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 16:39:30,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=93600.0, ans=0.125 2024-08-09 16:39:31,500 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 16:39:34,478 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 16:39:37,014 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 16:39:41,038 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 16:39:52,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=93700.0, ans=0.125 2024-08-09 16:39:56,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.22 vs. limit=10.0 2024-08-09 16:40:15,302 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 16:40:24,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.921e+01 3.226e+01 3.791e+01 1.210e+02, threshold=6.451e+01, percent-clipped=3.0 2024-08-09 16:40:24,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9400, loss[loss=0.1277, beats_loss=0.01348, ecapa_loss=0.000444, whisper_loss=0.1098, over 21766.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01318, ecapa_loss=0.0004222, whisper_loss=0.1057, over 3935682.59 frames. ], batch size: 89, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:40:27,646 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 16:40:31,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=94000.0, ans=0.125 2024-08-09 16:40:46,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-09 16:40:48,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=94100.0, ans=0.0 2024-08-09 16:40:52,292 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 16:40:54,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=94200.0, ans=0.125 2024-08-09 16:40:59,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94200.0, ans=0.0 2024-08-09 16:41:07,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=94300.0, ans=0.0 2024-08-09 16:41:14,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=94300.0, ans=0.0 2024-08-09 16:41:15,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=94300.0, ans=0.125 2024-08-09 16:41:22,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=94400.0, ans=0.125 2024-08-09 16:41:22,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=94400.0, ans=0.125 2024-08-09 16:41:30,273 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 16:41:32,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9450, loss[loss=0.1173, beats_loss=0.01538, ecapa_loss=0.000358, whisper_loss=0.09833, over 21443.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01327, ecapa_loss=0.0004196, whisper_loss=0.1049, over 3912963.29 frames. ], batch size: 81, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:41:33,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=94500.0, ans=22.5 2024-08-09 16:41:48,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2024-08-09 16:42:05,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=94700.0, ans=0.125 2024-08-09 16:42:14,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=94800.0, ans=0.125 2024-08-09 16:42:15,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=94800.0, ans=0.125 2024-08-09 16:42:40,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.979e+01 3.573e+01 4.112e+01 7.498e+01, threshold=7.146e+01, percent-clipped=2.0 2024-08-09 16:42:40,499 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9500, loss[loss=0.1252, beats_loss=0.0153, ecapa_loss=0.0003968, whisper_loss=0.1059, over 18991.00 frames. ], tot_loss[loss=0.1224, beats_loss=0.01335, ecapa_loss=0.0004192, whisper_loss=0.1049, over 3899086.80 frames. ], batch size: 73, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:42:53,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=95100.0, ans=0.125 2024-08-09 16:43:24,293 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 16:43:24,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2024-08-09 16:43:41,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=95400.0, ans=0.5 2024-08-09 16:43:47,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=95500.0, ans=0.125 2024-08-09 16:43:48,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9550, loss[loss=0.1084, beats_loss=0.01277, ecapa_loss=0.000429, whisper_loss=0.09137, over 16524.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01345, ecapa_loss=0.0004184, whisper_loss=0.1043, over 3890802.65 frames. ], batch size: 63, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:43:51,935 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-09 16:44:13,898 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-09 16:44:18,021 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 16:44:36,865 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 16:44:38,426 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 16:44:54,102 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 16:44:56,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.093e+01 3.544e+01 4.156e+01 7.056e+01, threshold=7.088e+01, percent-clipped=0.0 2024-08-09 16:44:56,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9600, loss[loss=0.1353, beats_loss=0.01288, ecapa_loss=0.0004308, whisper_loss=0.1181, over 22537.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01338, ecapa_loss=0.0004205, whisper_loss=0.1044, over 3874079.41 frames. ], batch size: 91, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:44:58,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96000.0, ans=0.1 2024-08-09 16:45:23,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-09 16:45:24,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=96200.0, ans=0.035 2024-08-09 16:45:30,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2024-08-09 16:45:41,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=96300.0, ans=0.0 2024-08-09 16:45:42,775 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 16:45:45,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=96300.0, ans=0.125 2024-08-09 16:45:53,392 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 16:46:01,825 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 16:46:04,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9650, loss[loss=0.133, beats_loss=0.01325, ecapa_loss=0.0005261, whisper_loss=0.1145, over 21925.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01335, ecapa_loss=0.0004197, whisper_loss=0.1047, over 3855509.85 frames. ], batch size: 90, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:46:17,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-09 16:46:30,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=96700.0, ans=0.125 2024-08-09 16:46:33,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=96700.0, ans=0.125 2024-08-09 16:46:36,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=96700.0, ans=0.0 2024-08-09 16:46:43,702 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 16:46:45,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96800.0, ans=0.1 2024-08-09 16:46:53,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=96800.0, ans=0.0 2024-08-09 16:46:55,581 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 16:47:12,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.968e+01 3.449e+01 4.387e+01 7.611e+01, threshold=6.898e+01, percent-clipped=2.0 2024-08-09 16:47:12,631 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9700, loss[loss=0.1016, beats_loss=0.01246, ecapa_loss=0.0003967, whisper_loss=0.08514, over 13992.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01332, ecapa_loss=0.0004212, whisper_loss=0.1047, over 3849404.69 frames. ], batch size: 53, lr: 3.52e-02, grad_scale: 512.0 2024-08-09 16:47:18,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=97000.0, ans=0.125 2024-08-09 16:47:24,599 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 16:47:36,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=97100.0, ans=0.125 2024-08-09 16:47:42,251 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 16:47:49,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97200.0, ans=0.1 2024-08-09 16:47:57,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=97300.0, ans=0.0 2024-08-09 16:48:03,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.06 vs. limit=10.0 2024-08-09 16:48:09,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97400.0, ans=0.1 2024-08-09 16:48:11,503 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 17 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 16:48:15,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=97400.0, ans=0.0 2024-08-09 16:48:17,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-08-09 16:48:22,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9750, loss[loss=0.1273, beats_loss=0.01294, ecapa_loss=0.0004633, whisper_loss=0.1097, over 22465.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01329, ecapa_loss=0.0004188, whisper_loss=0.1041, over 3834694.72 frames. ], batch size: 93, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:48:41,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-09 16:48:52,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=97700.0, ans=0.125 2024-08-09 16:49:04,197 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 16:49:22,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2024-08-09 16:49:31,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.868e+01 3.333e+01 3.887e+01 7.337e+01, threshold=6.667e+01, percent-clipped=1.0 2024-08-09 16:49:31,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9800, loss[loss=0.117, beats_loss=0.01434, ecapa_loss=0.0004501, whisper_loss=0.09813, over 22042.00 frames. ], tot_loss[loss=0.121, beats_loss=0.0133, ecapa_loss=0.0004194, whisper_loss=0.1035, over 3829245.46 frames. ], batch size: 90, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:49:31,553 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-09 16:49:31,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=98000.0, ans=0.125 2024-08-09 16:49:37,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=98000.0, ans=0.2 2024-08-09 16:49:39,566 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 16:49:55,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.12 vs. limit=5.0 2024-08-09 16:50:17,078 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 16:50:20,113 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-09 16:50:30,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=98400.0, ans=0.125 2024-08-09 16:50:40,282 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.527e+00 2024-08-09 16:50:43,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=98400.0, ans=0.0 2024-08-09 16:50:44,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=98400.0, ans=0.0 2024-08-09 16:50:47,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9850, loss[loss=0.1299, beats_loss=0.01338, ecapa_loss=0.0004112, whisper_loss=0.1124, over 18124.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01325, ecapa_loss=0.00042, whisper_loss=0.104, over 3858252.92 frames. ], batch size: 68, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:51:02,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=98500.0, ans=0.2 2024-08-09 16:51:21,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-09 16:51:28,602 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 16:51:28,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=98700.0, ans=0.0 2024-08-09 16:51:37,024 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 35 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 16:51:42,381 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-09 16:51:46,526 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 16:51:52,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=98800.0, ans=0.0 2024-08-09 16:52:11,594 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.990e+01 3.470e+01 4.121e+01 8.675e+01, threshold=6.939e+01, percent-clipped=3.0 2024-08-09 16:52:11,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9900, loss[loss=0.1279, beats_loss=0.01367, ecapa_loss=0.0003746, whisper_loss=0.1105, over 18472.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01323, ecapa_loss=0.0004197, whisper_loss=0.1045, over 3880636.43 frames. ], batch size: 73, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:52:27,763 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-09 16:52:31,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=99100.0, ans=15.0 2024-08-09 16:52:39,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-09 16:52:40,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=99100.0, ans=0.0 2024-08-09 16:52:44,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-09 16:52:47,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=99200.0, ans=0.125 2024-08-09 16:53:13,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=99300.0, ans=0.0 2024-08-09 16:53:23,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=99400.0, ans=0.125 2024-08-09 16:53:24,163 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 16:53:27,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99400.0, ans=0.1 2024-08-09 16:53:27,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=99400.0, ans=0.0 2024-08-09 16:53:35,723 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 9950, loss[loss=0.1553, beats_loss=0.01147, ecapa_loss=0.0004146, whisper_loss=0.1397, over 23482.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01328, ecapa_loss=0.000416, whisper_loss=0.1038, over 3869900.21 frames. ], batch size: 90, lr: 3.49e-02, grad_scale: 512.0 2024-08-09 16:53:42,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=99500.0, ans=0.0 2024-08-09 16:53:42,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.40 vs. limit=15.0 2024-08-09 16:53:56,958 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-09 16:54:05,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=15.0 2024-08-09 16:54:06,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99600.0, ans=0.1 2024-08-09 16:54:07,773 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 16:54:08,930 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 16:54:16,414 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-09 16:54:18,151 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 16:54:21,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=99700.0, ans=0.0 2024-08-09 16:54:48,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2024-08-09 16:54:53,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.893e+01 3.392e+01 3.870e+01 8.367e+01, threshold=6.783e+01, percent-clipped=1.0 2024-08-09 16:54:53,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10000, loss[loss=0.1451, beats_loss=0.01012, ecapa_loss=0.0003959, whisper_loss=0.131, over 18912.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01331, ecapa_loss=0.000417, whisper_loss=0.1038, over 3874879.08 frames. ], batch size: 71, lr: 3.49e-02, grad_scale: 1024.0 2024-08-09 16:54:54,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=100000.0, ans=0.2 2024-08-09 16:54:55,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2024-08-09 16:54:55,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=100000.0, ans=0.0 2024-08-09 16:54:58,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100000.0, ans=0.1 2024-08-09 16:55:17,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.05 vs. limit=22.5 2024-08-09 16:55:39,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=100300.0, ans=0.125 2024-08-09 16:55:40,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=100300.0, ans=0.0 2024-08-09 16:55:47,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=100300.0, ans=0.125 2024-08-09 16:55:50,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=100400.0, ans=0.0 2024-08-09 16:55:51,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=100400.0, ans=0.2 2024-08-09 16:55:53,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=10.0 2024-08-09 16:56:03,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10050, loss[loss=0.1581, beats_loss=0.008461, ecapa_loss=0.0004309, whisper_loss=0.1454, over 17888.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01324, ecapa_loss=0.0004172, whisper_loss=0.1042, over 3881213.22 frames. ], batch size: 71, lr: 3.48e-02, grad_scale: 1024.0 2024-08-09 16:56:26,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=100600.0, ans=0.0 2024-08-09 16:56:36,054 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-09 16:56:37,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=100700.0, ans=0.2 2024-08-09 16:57:02,424 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 16:57:08,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=100900.0, ans=15.0 2024-08-09 16:57:12,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.921e+01 3.378e+01 4.111e+01 6.632e+01, threshold=6.756e+01, percent-clipped=0.0 2024-08-09 16:57:12,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10100, loss[loss=0.1153, beats_loss=0.01284, ecapa_loss=0.0004683, whisper_loss=0.09776, over 17985.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01325, ecapa_loss=0.0004172, whisper_loss=0.1033, over 3886341.17 frames. ], batch size: 76, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:57:14,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-09 16:57:15,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=101000.0, ans=0.125 2024-08-09 16:57:18,331 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 16:57:23,848 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 16:57:25,259 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 16:57:48,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=101200.0, ans=0.125 2024-08-09 16:57:58,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-09 16:58:06,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=101400.0, ans=0.125 2024-08-09 16:58:14,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-08-09 16:58:16,836 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 16:58:20,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10150, loss[loss=0.1228, beats_loss=0.01516, ecapa_loss=0.0003807, whisper_loss=0.1038, over 16714.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01323, ecapa_loss=0.0004215, whisper_loss=0.103, over 3906782.55 frames. ], batch size: 67, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:58:32,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.65 vs. limit=22.5 2024-08-09 16:58:45,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=101600.0, ans=0.125 2024-08-09 16:59:07,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=101800.0, ans=10.0 2024-08-09 16:59:12,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=101800.0, ans=0.2 2024-08-09 16:59:23,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=101900.0, ans=0.0 2024-08-09 16:59:29,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.923e+01 3.411e+01 4.089e+01 6.898e+01, threshold=6.822e+01, percent-clipped=2.0 2024-08-09 16:59:30,004 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10200, loss[loss=0.1292, beats_loss=0.01168, ecapa_loss=0.0003868, whisper_loss=0.1136, over 23140.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01326, ecapa_loss=0.0004197, whisper_loss=0.1029, over 3905087.07 frames. ], batch size: 90, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 16:59:55,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102100.0, ans=0.1 2024-08-09 17:00:01,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.11 vs. limit=15.0 2024-08-09 17:00:16,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=102300.0, ans=0.125 2024-08-09 17:00:28,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=102400.0, ans=0.125 2024-08-09 17:00:29,278 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-09 17:00:32,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=102400.0, ans=0.0 2024-08-09 17:00:32,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=102400.0, ans=0.125 2024-08-09 17:00:38,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10250, loss[loss=0.1239, beats_loss=0.01108, ecapa_loss=0.0004309, whisper_loss=0.1085, over 14688.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01315, ecapa_loss=0.0004159, whisper_loss=0.1038, over 3900869.97 frames. ], batch size: 54, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 17:00:46,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=102500.0, ans=0.0 2024-08-09 17:01:02,032 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 17:01:02,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=102600.0, ans=0.125 2024-08-09 17:01:02,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.70 vs. limit=22.5 2024-08-09 17:01:16,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=102700.0, ans=0.1 2024-08-09 17:01:19,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=102800.0, ans=0.2 2024-08-09 17:01:20,117 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 17:01:27,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2024-08-09 17:01:45,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=102900.0, ans=0.0 2024-08-09 17:01:47,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.938e+01 3.467e+01 4.292e+01 7.706e+01, threshold=6.934e+01, percent-clipped=1.0 2024-08-09 17:01:47,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10300, loss[loss=0.1255, beats_loss=0.01462, ecapa_loss=0.0003199, whisper_loss=0.1077, over 20954.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01311, ecapa_loss=0.0004131, whisper_loss=0.1036, over 3873939.90 frames. ], batch size: 79, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:01:55,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=103000.0, ans=0.2 2024-08-09 17:02:05,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=103100.0, ans=0.0 2024-08-09 17:02:06,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=103100.0, ans=0.2 2024-08-09 17:02:23,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=103200.0, ans=0.125 2024-08-09 17:02:25,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=103200.0, ans=0.125 2024-08-09 17:02:40,303 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-09 17:02:45,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=103400.0, ans=0.125 2024-08-09 17:02:46,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2024-08-09 17:02:54,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10350, loss[loss=0.1031, beats_loss=0.01546, ecapa_loss=0.0003885, whisper_loss=0.08376, over 17275.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01316, ecapa_loss=0.0004138, whisper_loss=0.1034, over 3898559.67 frames. ], batch size: 71, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:02:56,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=103500.0, ans=0.125 2024-08-09 17:02:59,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=103500.0, ans=0.0 2024-08-09 17:03:01,570 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 17:03:02,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.50 vs. limit=10.0 2024-08-09 17:03:13,952 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 17:03:16,573 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 17:03:21,881 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 17:03:23,226 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 17:03:38,418 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-09 17:03:41,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=103800.0, ans=0.125 2024-08-09 17:03:50,867 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 35 from Vox, 36 fro AS 2024-08-09 17:03:52,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=103900.0, ans=0.0 2024-08-09 17:04:02,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 3.016e+01 3.413e+01 4.405e+01 7.924e+01, threshold=6.827e+01, percent-clipped=1.0 2024-08-09 17:04:02,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10400, loss[loss=0.08441, beats_loss=0.01631, ecapa_loss=0.0004395, whisper_loss=0.06371, over 14814.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01327, ecapa_loss=0.0004108, whisper_loss=0.1034, over 3900052.71 frames. ], batch size: 63, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:04:06,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=104000.0, ans=0.125 2024-08-09 17:04:07,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=104000.0, ans=0.2 2024-08-09 17:04:42,426 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.220e+01 2024-08-09 17:04:46,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=104300.0, ans=0.0 2024-08-09 17:04:51,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=104300.0, ans=0.125 2024-08-09 17:04:54,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=104300.0, ans=0.0 2024-08-09 17:04:55,876 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 17:05:01,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-09 17:05:04,245 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 17:05:05,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=104400.0, ans=0.125 2024-08-09 17:05:12,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10450, loss[loss=0.1214, beats_loss=0.0135, ecapa_loss=0.000357, whisper_loss=0.1043, over 21991.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01332, ecapa_loss=0.0004085, whisper_loss=0.1027, over 3895203.77 frames. ], batch size: 89, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:05:15,141 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 17:05:22,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=104500.0, ans=0.125 2024-08-09 17:05:26,349 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-09 17:05:27,919 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 17:05:43,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=104700.0, ans=0.1 2024-08-09 17:05:45,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=104700.0, ans=0.125 2024-08-09 17:05:45,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=104700.0, ans=0.125 2024-08-09 17:06:08,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=104900.0, ans=0.1 2024-08-09 17:06:14,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=104900.0, ans=0.125 2024-08-09 17:06:22,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 3.012e+01 3.451e+01 3.999e+01 6.423e+01, threshold=6.903e+01, percent-clipped=0.0 2024-08-09 17:06:22,505 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10500, loss[loss=0.1003, beats_loss=0.01613, ecapa_loss=0.0003494, whisper_loss=0.08066, over 15220.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01317, ecapa_loss=0.0004086, whisper_loss=0.103, over 3853800.46 frames. ], batch size: 61, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:06:26,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105000.0, ans=0.1 2024-08-09 17:06:30,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=105000.0, ans=0.125 2024-08-09 17:06:36,636 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 17:06:58,942 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 17:07:13,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=105300.0, ans=0.0 2024-08-09 17:07:18,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=105400.0, ans=0.125 2024-08-09 17:07:22,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=105400.0, ans=0.0 2024-08-09 17:07:29,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=105400.0, ans=0.2 2024-08-09 17:07:32,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10550, loss[loss=0.0959, beats_loss=0.01456, ecapa_loss=0.0003833, whisper_loss=0.07751, over 13470.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01322, ecapa_loss=0.0004056, whisper_loss=0.1027, over 3830289.83 frames. ], batch size: 55, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:07:36,770 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 17:07:42,163 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 17:07:44,732 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 17:07:57,579 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 17:08:07,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=105700.0, ans=0.0 2024-08-09 17:08:27,799 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 17:08:41,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.989e+01 3.482e+01 4.095e+01 9.318e+01, threshold=6.964e+01, percent-clipped=2.0 2024-08-09 17:08:41,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10600, loss[loss=0.1027, beats_loss=0.01408, ecapa_loss=0.0003296, whisper_loss=0.08536, over 17078.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01316, ecapa_loss=0.0004064, whisper_loss=0.1036, over 3869044.76 frames. ], batch size: 65, lr: 3.42e-02, grad_scale: 1024.0 2024-08-09 17:09:01,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=106100.0, ans=0.0 2024-08-09 17:09:32,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106300.0, ans=0.1 2024-08-09 17:09:32,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=106300.0, ans=0.2 2024-08-09 17:09:33,728 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 17:09:45,064 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 17:09:49,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=106400.0, ans=0.125 2024-08-09 17:09:51,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10650, loss[loss=0.09678, beats_loss=0.0148, ecapa_loss=0.0003969, whisper_loss=0.07801, over 21775.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01319, ecapa_loss=0.0004051, whisper_loss=0.1041, over 3851756.14 frames. ], batch size: 91, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:09:52,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=106500.0, ans=0.09899494936611666 2024-08-09 17:10:00,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106500.0, ans=0.0 2024-08-09 17:10:14,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=106600.0, ans=0.125 2024-08-09 17:10:14,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2024-08-09 17:10:16,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106600.0, ans=0.1 2024-08-09 17:10:17,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=106600.0, ans=0.2 2024-08-09 17:10:24,236 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 17:10:26,818 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 17:10:27,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=106700.0, ans=0.125 2024-08-09 17:10:41,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.53 vs. limit=22.5 2024-08-09 17:11:01,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 3.109e+01 3.454e+01 4.119e+01 5.374e+01, threshold=6.908e+01, percent-clipped=0.0 2024-08-09 17:11:01,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10700, loss[loss=0.09497, beats_loss=0.01652, ecapa_loss=0.0003453, whisper_loss=0.07499, over 17977.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01317, ecapa_loss=0.0004038, whisper_loss=0.1043, over 3836656.09 frames. ], batch size: 72, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:11:17,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2024-08-09 17:11:21,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=107100.0, ans=0.0 2024-08-09 17:11:23,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.87 vs. limit=10.0 2024-08-09 17:11:30,987 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 17:11:34,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=107200.0, ans=0.2 2024-08-09 17:11:40,380 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 17:11:54,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-08-09 17:12:10,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10750, loss[loss=0.1264, beats_loss=0.01371, ecapa_loss=0.0003804, whisper_loss=0.1088, over 17859.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01313, ecapa_loss=0.0004039, whisper_loss=0.1046, over 3856624.72 frames. ], batch size: 70, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:12:11,852 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 17:12:34,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=107600.0, ans=0.0 2024-08-09 17:12:52,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=107800.0, ans=0.0 2024-08-09 17:13:04,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-08-09 17:13:15,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=107900.0, ans=0.0 2024-08-09 17:13:18,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.960e+01 3.558e+01 4.572e+01 9.073e+01, threshold=7.116e+01, percent-clipped=3.0 2024-08-09 17:13:18,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10800, loss[loss=0.1426, beats_loss=0.01548, ecapa_loss=0.0003776, whisper_loss=0.1233, over 23202.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01302, ecapa_loss=0.0004044, whisper_loss=0.106, over 3902794.94 frames. ], batch size: 94, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:13:29,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=108000.0, ans=0.0 2024-08-09 17:13:37,456 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 17:13:45,520 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 35 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 17:13:59,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=108300.0, ans=0.125 2024-08-09 17:14:09,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=108300.0, ans=0.125 2024-08-09 17:14:15,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2024-08-09 17:14:21,178 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 17:14:26,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10850, loss[loss=0.1443, beats_loss=0.009597, ecapa_loss=0.0004268, whisper_loss=0.1304, over 19633.00 frames. ], tot_loss[loss=0.1234, beats_loss=0.01297, ecapa_loss=0.0004047, whisper_loss=0.1064, over 3943928.74 frames. ], batch size: 76, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:14:34,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=108500.0, ans=0.07 2024-08-09 17:14:34,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-09 17:14:44,997 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 17:14:51,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=108600.0, ans=0.05 2024-08-09 17:15:04,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=108700.0, ans=0.5 2024-08-09 17:15:10,199 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-09 17:15:14,209 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 17:15:30,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=15.0 2024-08-09 17:15:33,025 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 17:15:35,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.154e+01 3.497e+01 4.138e+01 7.474e+01, threshold=6.993e+01, percent-clipped=1.0 2024-08-09 17:15:35,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10900, loss[loss=0.1223, beats_loss=0.01321, ecapa_loss=0.0004493, whisper_loss=0.1046, over 15564.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01297, ecapa_loss=0.0004018, whisper_loss=0.1067, over 3953873.35 frames. ], batch size: 64, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:15:41,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=15.0 2024-08-09 17:15:42,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=109000.0, ans=0.07 2024-08-09 17:15:45,377 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 17:15:52,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=109100.0, ans=0.125 2024-08-09 17:16:00,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=109100.0, ans=0.125 2024-08-09 17:16:09,745 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-09 17:16:19,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109300.0, ans=0.1 2024-08-09 17:16:23,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=109300.0, ans=0.0 2024-08-09 17:16:43,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 10950, loss[loss=0.1195, beats_loss=0.01241, ecapa_loss=0.0004659, whisper_loss=0.1025, over 16417.00 frames. ], tot_loss[loss=0.1232, beats_loss=0.01299, ecapa_loss=0.0004018, whisper_loss=0.1061, over 3953732.94 frames. ], batch size: 66, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:17:11,222 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 17:17:11,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=109700.0, ans=0.2 2024-08-09 17:17:12,801 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-09 17:17:27,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-09 17:17:29,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=109800.0, ans=0.125 2024-08-09 17:17:42,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=109900.0, ans=0.125 2024-08-09 17:17:51,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.947e+01 3.240e+01 3.931e+01 5.659e+01, threshold=6.481e+01, percent-clipped=0.0 2024-08-09 17:17:51,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11000, loss[loss=0.1054, beats_loss=0.017, ecapa_loss=0.000303, whisper_loss=0.08542, over 20383.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01301, ecapa_loss=0.0004036, whisper_loss=0.1057, over 3955283.91 frames. ], batch size: 82, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:17:52,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=110000.0, ans=0.2 2024-08-09 17:17:57,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=110000.0, ans=0.0 2024-08-09 17:18:16,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-08-09 17:18:16,971 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 17:18:19,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.46 vs. limit=10.0 2024-08-09 17:18:36,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=110300.0, ans=0.2 2024-08-09 17:18:37,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=110300.0, ans=0.0 2024-08-09 17:18:48,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=110400.0, ans=0.0 2024-08-09 17:18:58,248 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 17:19:00,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11050, loss[loss=0.1365, beats_loss=0.01391, ecapa_loss=0.0003347, whisper_loss=0.1192, over 18039.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01293, ecapa_loss=0.0004027, whisper_loss=0.1055, over 3934313.09 frames. ], batch size: 69, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:19:13,195 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 17:19:32,706 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 17:19:36,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.10 vs. limit=22.5 2024-08-09 17:19:43,193 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 17:19:48,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=110800.0, ans=0.125 2024-08-09 17:19:56,581 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 17:20:00,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-09 17:20:02,615 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 17:20:05,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110900.0, ans=0.1 2024-08-09 17:20:06,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=110900.0, ans=0.125 2024-08-09 17:20:10,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 3.030e+01 3.567e+01 4.272e+01 6.137e+01, threshold=7.134e+01, percent-clipped=0.0 2024-08-09 17:20:10,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11100, loss[loss=0.1225, beats_loss=0.01003, ecapa_loss=0.0003852, whisper_loss=0.1086, over 17100.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01298, ecapa_loss=0.0004003, whisper_loss=0.1051, over 3887171.34 frames. ], batch size: 65, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:20:10,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111000.0, ans=0.125 2024-08-09 17:20:10,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111000.0, ans=0.1 2024-08-09 17:20:11,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=111000.0, ans=22.5 2024-08-09 17:20:19,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=111000.0, ans=0.0 2024-08-09 17:20:27,280 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 17:21:11,237 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 17 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-09 17:21:19,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11150, loss[loss=0.104, beats_loss=0.01457, ecapa_loss=0.0004022, whisper_loss=0.08544, over 16765.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01301, ecapa_loss=0.0003989, whisper_loss=0.1046, over 3873447.59 frames. ], batch size: 65, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:21:20,809 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 17:21:25,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=111500.0, ans=0.0 2024-08-09 17:21:26,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=111500.0, ans=0.125 2024-08-09 17:21:27,962 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-09 17:21:36,505 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.332e+00 2024-08-09 17:21:38,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2024-08-09 17:21:46,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111700.0, ans=0.125 2024-08-09 17:21:54,208 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-09 17:22:12,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=111800.0, ans=0.0 2024-08-09 17:22:23,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=111900.0, ans=0.0 2024-08-09 17:22:28,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.940e+01 3.532e+01 4.042e+01 6.455e+01, threshold=7.065e+01, percent-clipped=0.0 2024-08-09 17:22:28,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11200, loss[loss=0.1143, beats_loss=0.01395, ecapa_loss=0.0003869, whisper_loss=0.09652, over 19617.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01301, ecapa_loss=0.0003982, whisper_loss=0.1045, over 3900260.07 frames. ], batch size: 79, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:22:28,785 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-09 17:22:36,823 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-09 17:22:42,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=112100.0, ans=0.125 2024-08-09 17:22:44,101 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 18 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-09 17:22:46,765 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 17:22:58,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-09 17:23:06,567 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 17:23:15,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=112300.0, ans=0.125 2024-08-09 17:23:27,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2024-08-09 17:23:34,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.03 vs. limit=22.5 2024-08-09 17:23:37,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11250, loss[loss=0.1444, beats_loss=0.01036, ecapa_loss=0.0003838, whisper_loss=0.1302, over 22475.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01298, ecapa_loss=0.0003989, whisper_loss=0.1044, over 3877504.28 frames. ], batch size: 85, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:23:47,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-09 17:23:53,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=112600.0, ans=0.025 2024-08-09 17:23:54,800 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 17:24:04,595 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-09 17:24:10,149 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 17:24:13,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=112700.0, ans=22.5 2024-08-09 17:24:13,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2024-08-09 17:24:24,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=112800.0, ans=0.0 2024-08-09 17:24:24,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112800.0, ans=0.1 2024-08-09 17:24:28,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=112800.0, ans=0.0 2024-08-09 17:24:38,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=15.0 2024-08-09 17:24:40,843 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-09 17:24:42,204 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-09 17:24:47,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.986e+01 3.509e+01 4.225e+01 7.875e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-09 17:24:47,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11300, loss[loss=0.1115, beats_loss=0.01256, ecapa_loss=0.0004202, whisper_loss=0.09473, over 20647.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01298, ecapa_loss=0.0003975, whisper_loss=0.1043, over 3884251.02 frames. ], batch size: 88, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:24:48,858 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 17:24:53,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=113000.0, ans=0.0 2024-08-09 17:24:56,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2024-08-09 17:25:28,372 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 17:25:44,839 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 17:25:56,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11350, loss[loss=0.1098, beats_loss=0.01398, ecapa_loss=0.0003489, whisper_loss=0.09235, over 22780.00 frames. ], tot_loss[loss=0.121, beats_loss=0.013, ecapa_loss=0.0003971, whisper_loss=0.104, over 3891419.61 frames. ], batch size: 90, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:26:23,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=113700.0, ans=0.2 2024-08-09 17:26:23,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113700.0, ans=0.1 2024-08-09 17:26:24,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-09 17:26:26,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=113700.0, ans=10.0 2024-08-09 17:26:30,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=113700.0, ans=0.0 2024-08-09 17:26:44,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=113800.0, ans=0.0 2024-08-09 17:27:05,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114000.0, ans=0.1 2024-08-09 17:27:06,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.900e+01 3.368e+01 4.036e+01 6.013e+01, threshold=6.736e+01, percent-clipped=0.0 2024-08-09 17:27:06,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11400, loss[loss=0.1158, beats_loss=0.01034, ecapa_loss=0.00041, whisper_loss=0.1013, over 17569.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.013, ecapa_loss=0.000397, whisper_loss=0.1041, over 3879258.05 frames. ], batch size: 71, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:27:08,112 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 17:27:08,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=114000.0, ans=0.2 2024-08-09 17:27:17,656 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-09 17:27:19,021 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 17:27:24,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=114100.0, ans=0.0 2024-08-09 17:27:34,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=15.0 2024-08-09 17:27:38,166 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 17:27:45,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=114200.0, ans=0.125 2024-08-09 17:27:55,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=114300.0, ans=0.0 2024-08-09 17:28:02,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=114400.0, ans=15.0 2024-08-09 17:28:15,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11450, loss[loss=0.1283, beats_loss=0.01436, ecapa_loss=0.000381, whisper_loss=0.1101, over 14041.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01305, ecapa_loss=0.0003971, whisper_loss=0.1039, over 3880178.34 frames. ], batch size: 54, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:29:26,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 3.054e+01 3.515e+01 4.307e+01 8.084e+01, threshold=7.029e+01, percent-clipped=1.0 2024-08-09 17:29:27,015 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11500, loss[loss=0.1306, beats_loss=0.01242, ecapa_loss=0.0003661, whisper_loss=0.1145, over 20387.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01314, ecapa_loss=0.0003952, whisper_loss=0.1034, over 3855266.04 frames. ], batch size: 79, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:29:37,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=115000.0, ans=0.125 2024-08-09 17:29:46,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=115100.0, ans=0.125 2024-08-09 17:30:00,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=115200.0, ans=0.125 2024-08-09 17:30:14,253 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 17:30:21,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=115300.0, ans=0.0 2024-08-09 17:30:25,700 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 17:30:40,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-09 17:30:41,363 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11550, loss[loss=0.1189, beats_loss=0.01271, ecapa_loss=0.0005041, whisper_loss=0.1012, over 21211.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01319, ecapa_loss=0.0003938, whisper_loss=0.1034, over 3833830.20 frames. ], batch size: 91, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:30:44,350 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 17:31:04,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=115600.0, ans=15.0 2024-08-09 17:31:08,630 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 17:31:09,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115700.0, ans=0.1 2024-08-09 17:31:10,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=115700.0, ans=0.0 2024-08-09 17:31:11,433 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 17:31:16,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=115700.0, ans=0.125 2024-08-09 17:31:22,956 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 17:31:23,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=115800.0, ans=0.2 2024-08-09 17:31:33,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=15.0 2024-08-09 17:31:53,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.957e+01 3.409e+01 3.917e+01 8.485e+01, threshold=6.817e+01, percent-clipped=1.0 2024-08-09 17:31:53,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11600, loss[loss=0.1124, beats_loss=0.01466, ecapa_loss=0.0003955, whisper_loss=0.0938, over 15193.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.0132, ecapa_loss=0.0003934, whisper_loss=0.103, over 3857439.52 frames. ], batch size: 62, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:32:00,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116000.0, ans=0.125 2024-08-09 17:32:24,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=116200.0, ans=0.1 2024-08-09 17:32:32,121 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 17:32:36,254 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 17:32:38,872 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 17:32:46,063 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 17:32:57,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=116400.0, ans=0.0 2024-08-09 17:32:58,459 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 17 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-09 17:33:02,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2024-08-09 17:33:04,435 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 17:33:07,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11650, loss[loss=0.1322, beats_loss=0.01243, ecapa_loss=0.00039, whisper_loss=0.1159, over 17236.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01325, ecapa_loss=0.0003926, whisper_loss=0.1022, over 3869471.32 frames. ], batch size: 69, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:33:09,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=116500.0, ans=0.125 2024-08-09 17:33:19,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=116500.0, ans=0.0 2024-08-09 17:33:29,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-08-09 17:33:55,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=116800.0, ans=0.125 2024-08-09 17:33:56,009 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 17:34:04,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=116900.0, ans=0.125 2024-08-09 17:34:08,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=116900.0, ans=0.0 2024-08-09 17:34:16,227 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 17:34:18,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 3.106e+01 3.561e+01 4.217e+01 8.775e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 17:34:18,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11700, loss[loss=0.09637, beats_loss=0.0146, ecapa_loss=0.0003995, whisper_loss=0.07777, over 16399.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01327, ecapa_loss=0.0003907, whisper_loss=0.1029, over 3876766.48 frames. ], batch size: 66, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:35:18,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=117400.0, ans=0.1 2024-08-09 17:35:18,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=117400.0, ans=0.125 2024-08-09 17:35:27,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117400.0, ans=0.1 2024-08-09 17:35:29,643 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 17:35:29,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=117500.0, ans=0.015 2024-08-09 17:35:30,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11750, loss[loss=0.1348, beats_loss=0.01115, ecapa_loss=0.00048, whisper_loss=0.1188, over 19608.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01329, ecapa_loss=0.0003889, whisper_loss=0.1025, over 3907190.90 frames. ], batch size: 81, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:35:32,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=117500.0, ans=0.0 2024-08-09 17:35:44,869 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-09 17:35:45,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=117600.0, ans=0.2 2024-08-09 17:35:50,290 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 17:35:56,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-09 17:36:00,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-09 17:36:04,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.88 vs. limit=10.0 2024-08-09 17:36:05,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=117700.0, ans=0.0 2024-08-09 17:36:10,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=117700.0, ans=0.0 2024-08-09 17:36:12,903 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-09 17:36:15,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-09 17:36:27,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=117900.0, ans=0.0 2024-08-09 17:36:40,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.943e+01 3.344e+01 4.022e+01 9.659e+01, threshold=6.689e+01, percent-clipped=1.0 2024-08-09 17:36:40,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11800, loss[loss=0.1468, beats_loss=0.01078, ecapa_loss=0.0004561, whisper_loss=0.1315, over 21499.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01334, ecapa_loss=0.0003886, whisper_loss=0.1031, over 3929771.32 frames. ], batch size: 85, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:36:47,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118000.0, ans=0.1 2024-08-09 17:37:04,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=118100.0, ans=0.125 2024-08-09 17:37:09,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2024-08-09 17:37:48,179 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 17:37:49,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=118400.0, ans=0.2 2024-08-09 17:37:50,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118500.0, ans=0.1 2024-08-09 17:37:51,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11850, loss[loss=0.1265, beats_loss=0.01422, ecapa_loss=0.000379, whisper_loss=0.1085, over 22376.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01329, ecapa_loss=0.00039, whisper_loss=0.1036, over 3942664.15 frames. ], batch size: 90, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:37:51,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118500.0, ans=0.1 2024-08-09 17:37:51,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=118500.0, ans=0.125 2024-08-09 17:37:58,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118500.0, ans=0.1 2024-08-09 17:38:25,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=12.0 2024-08-09 17:38:43,612 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 17:38:48,072 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 17:38:59,966 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 17 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-09 17:39:03,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.477e+01 2.937e+01 3.452e+01 4.190e+01 6.711e+01, threshold=6.904e+01, percent-clipped=1.0 2024-08-09 17:39:03,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11900, loss[loss=0.1011, beats_loss=0.01374, ecapa_loss=0.0004082, whisper_loss=0.08325, over 21166.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01328, ecapa_loss=0.0003894, whisper_loss=0.1036, over 3941383.19 frames. ], batch size: 89, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:39:33,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-08-09 17:39:54,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=119300.0, ans=0.125 2024-08-09 17:39:57,432 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 32 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 17:40:00,421 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 17:40:11,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-09 17:40:15,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-09 17:40:17,606 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 11950, loss[loss=0.1198, beats_loss=0.01347, ecapa_loss=0.0004005, whisper_loss=0.1023, over 22616.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01323, ecapa_loss=0.0003913, whisper_loss=0.1037, over 3922966.19 frames. ], batch size: 92, lr: 3.28e-02, grad_scale: 1024.0 2024-08-09 17:40:35,330 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 17:40:36,723 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-09 17:40:43,547 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 17:40:44,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-09 17:40:45,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=119700.0, ans=0.0 2024-08-09 17:40:58,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2024-08-09 17:41:00,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119800.0, ans=0.1 2024-08-09 17:41:08,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119800.0, ans=0.1 2024-08-09 17:41:32,826 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-09 17:41:36,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.930e+01 3.462e+01 4.384e+01 7.473e+01, threshold=6.925e+01, percent-clipped=1.0 2024-08-09 17:41:36,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12000, loss[loss=0.09851, beats_loss=0.01567, ecapa_loss=0.0003829, whisper_loss=0.07901, over 21722.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01327, ecapa_loss=0.0003886, whisper_loss=0.1026, over 3911845.51 frames. ], batch size: 92, lr: 3.28e-02, grad_scale: 2048.0 2024-08-09 17:41:36,940 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 17:42:24,882 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on ASR_libri: loss=0.2866, beats_loss=0, ecapa_loss=0.00111, whisper_loss=0.2755, over 922467.00 frames. 2024-08-09 17:42:44,907 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on SV_voxceleb1: loss=0.01049, beats_loss=0, ecapa_loss=0.001049, whisper_loss=0, over 939242.00 frames. 2024-08-09 17:44:15,707 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7501, 1.7641, 2.5750, 2.5282], device='cuda:3') 2024-08-09 17:44:38,324 INFO [train_multi_KD3.py:1149] (3/4) Epoch 1, validation on AT_audioset: loss=0.03131, beats_loss=0.03131, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 17:44:38,328 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 17:44:46,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=120000.0, ans=0.0 2024-08-09 17:44:48,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=120000.0, ans=0.125 2024-08-09 17:44:48,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-09 17:45:25,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=12.0 2024-08-09 17:45:28,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-09 17:45:30,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=120300.0, ans=0.125 2024-08-09 17:45:46,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=120400.0, ans=0.0 2024-08-09 17:45:57,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12050, loss[loss=0.1499, beats_loss=0.009356, ecapa_loss=0.0004062, whisper_loss=0.1365, over 18070.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01324, ecapa_loss=0.000387, whisper_loss=0.1032, over 3919942.12 frames. ], batch size: 69, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:45:57,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=120500.0, ans=0.95 2024-08-09 17:45:59,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-09 17:46:02,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=120500.0, ans=0.125 2024-08-09 17:46:06,408 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-09 17:46:28,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.80 vs. limit=22.5 2024-08-09 17:46:37,184 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 17:46:45,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=120800.0, ans=0.2 2024-08-09 17:46:47,670 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 17:46:53,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-09 17:47:11,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=121000.0, ans=0.125 2024-08-09 17:47:12,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.990e+01 3.554e+01 4.139e+01 7.218e+01, threshold=7.107e+01, percent-clipped=1.0 2024-08-09 17:47:12,217 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12100, loss[loss=0.1265, beats_loss=0.0146, ecapa_loss=0.0003169, whisper_loss=0.1087, over 14457.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01324, ecapa_loss=0.0003899, whisper_loss=0.103, over 3903256.58 frames. ], batch size: 56, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:47:13,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2024-08-09 17:47:17,514 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 17:47:25,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=121000.0, ans=0.5 2024-08-09 17:47:27,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=121100.0, ans=0.2 2024-08-09 17:47:34,978 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 17:47:42,775 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 17:47:44,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121200.0, ans=0.0 2024-08-09 17:47:58,006 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 17:48:05,619 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 17:48:09,074 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 17:48:29,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12150, loss[loss=0.1276, beats_loss=0.01362, ecapa_loss=0.0003311, whisper_loss=0.1106, over 14204.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.0132, ecapa_loss=0.0003892, whisper_loss=0.103, over 3910025.66 frames. ], batch size: 54, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:48:32,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.95 vs. limit=22.5 2024-08-09 17:48:32,617 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 17:48:34,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=121500.0, ans=0.0 2024-08-09 17:48:37,334 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 34 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 17:48:40,782 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 17:48:42,145 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-09 17:48:50,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=121600.0, ans=0.125 2024-08-09 17:48:54,393 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 17:49:07,557 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 17:49:26,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=121800.0, ans=0.125 2024-08-09 17:49:45,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.869e+01 3.277e+01 4.136e+01 6.270e+01, threshold=6.555e+01, percent-clipped=0.0 2024-08-09 17:49:46,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12200, loss[loss=0.1124, beats_loss=0.01639, ecapa_loss=0.0002719, whisper_loss=0.09332, over 22492.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01326, ecapa_loss=0.0003909, whisper_loss=0.1028, over 3912892.02 frames. ], batch size: 88, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:49:46,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=122000.0, ans=0.125 2024-08-09 17:49:53,515 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 17:49:58,573 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-09 17:50:06,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.70 vs. limit=10.0 2024-08-09 17:50:15,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2024-08-09 17:50:27,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=15.0 2024-08-09 17:50:31,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=122300.0, ans=0.0 2024-08-09 17:51:01,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12250, loss[loss=0.1058, beats_loss=0.01373, ecapa_loss=0.0004249, whisper_loss=0.0878, over 21860.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01315, ecapa_loss=0.0003906, whisper_loss=0.1025, over 3909841.08 frames. ], batch size: 94, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:51:31,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=122700.0, ans=0.2 2024-08-09 17:52:05,574 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-09 17:52:17,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.887e+01 3.272e+01 4.030e+01 7.099e+01, threshold=6.544e+01, percent-clipped=1.0 2024-08-09 17:52:17,160 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12300, loss[loss=0.1054, beats_loss=0.01367, ecapa_loss=0.000347, whisper_loss=0.08829, over 19729.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01309, ecapa_loss=0.0003901, whisper_loss=0.1028, over 3891664.02 frames. ], batch size: 80, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:52:21,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=123000.0, ans=0.125 2024-08-09 17:52:27,088 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 17:52:28,559 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 17:52:28,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=123000.0, ans=0.035 2024-08-09 17:52:45,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=12.0 2024-08-09 17:52:48,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=123200.0, ans=0.2 2024-08-09 17:53:04,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=123300.0, ans=0.125 2024-08-09 17:53:13,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=123300.0, ans=0.0 2024-08-09 17:53:28,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=15.0 2024-08-09 17:53:31,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12350, loss[loss=0.1193, beats_loss=0.01106, ecapa_loss=0.0004169, whisper_loss=0.1041, over 15732.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01309, ecapa_loss=0.0003897, whisper_loss=0.1036, over 3896860.07 frames. ], batch size: 62, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:53:32,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=123500.0, ans=0.125 2024-08-09 17:53:50,557 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 17:54:06,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123700.0, ans=0.1 2024-08-09 17:54:21,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=123800.0, ans=0.0 2024-08-09 17:54:32,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-09 17:54:35,648 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-09 17:54:37,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-09 17:54:48,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.013e+01 3.404e+01 4.023e+01 7.879e+01, threshold=6.808e+01, percent-clipped=3.0 2024-08-09 17:54:48,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12400, loss[loss=0.128, beats_loss=0.01309, ecapa_loss=0.0003176, whisper_loss=0.1117, over 21006.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01315, ecapa_loss=0.0003855, whisper_loss=0.1034, over 3898148.49 frames. ], batch size: 79, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:55:15,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=124100.0, ans=0.125 2024-08-09 17:55:19,584 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 17:55:51,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=15.0 2024-08-09 17:56:00,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12450, loss[loss=0.1024, beats_loss=0.01352, ecapa_loss=0.0004595, whisper_loss=0.08425, over 17331.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01309, ecapa_loss=0.0003872, whisper_loss=0.1029, over 3881182.49 frames. ], batch size: 72, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:56:06,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=124500.0, ans=0.125 2024-08-09 17:56:06,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=124500.0, ans=0.125 2024-08-09 17:56:14,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=124600.0, ans=0.05 2024-08-09 17:56:15,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.01 vs. limit=5.0 2024-08-09 17:56:16,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=124600.0, ans=0.2 2024-08-09 17:57:14,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.994e+01 3.498e+01 4.030e+01 6.153e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-09 17:57:14,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12500, loss[loss=0.1376, beats_loss=0.01304, ecapa_loss=0.0003534, whisper_loss=0.121, over 23930.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01301, ecapa_loss=0.0003855, whisper_loss=0.1034, over 3875542.94 frames. ], batch size: 93, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:57:14,447 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 17:57:35,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=125100.0, ans=0.125 2024-08-09 17:57:39,624 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 17:57:46,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=125200.0, ans=0.5 2024-08-09 17:58:01,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=125300.0, ans=0.125 2024-08-09 17:58:05,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=125300.0, ans=0.125 2024-08-09 17:58:09,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-09 17:58:13,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=125400.0, ans=0.2 2024-08-09 17:58:20,092 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 17:58:28,855 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12550, loss[loss=0.1034, beats_loss=0.01335, ecapa_loss=0.0003974, whisper_loss=0.08611, over 21634.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01304, ecapa_loss=0.000388, whisper_loss=0.103, over 3846951.24 frames. ], batch size: 90, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:58:32,237 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 17:58:51,436 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.459e+02 2024-08-09 17:58:51,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=125600.0, ans=0.125 2024-08-09 17:59:10,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=125700.0, ans=0.125 2024-08-09 17:59:11,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=125700.0, ans=0.04949747468305833 2024-08-09 17:59:14,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=125800.0, ans=0.2 2024-08-09 17:59:19,987 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 17:59:24,815 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 17:59:27,500 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 17:59:30,128 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 17:59:33,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=125900.0, ans=0.125 2024-08-09 17:59:40,553 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-09 17:59:43,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.067e+01 3.520e+01 4.433e+01 6.633e+01, threshold=7.039e+01, percent-clipped=0.0 2024-08-09 17:59:43,283 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12600, loss[loss=0.1162, beats_loss=0.01239, ecapa_loss=0.0004065, whisper_loss=0.09972, over 19444.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01298, ecapa_loss=0.0003885, whisper_loss=0.1039, over 3868199.70 frames. ], batch size: 81, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:59:55,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=126000.0, ans=0.07 2024-08-09 18:00:07,814 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 18:00:16,698 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 18:00:27,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126300.0, ans=0.125 2024-08-09 18:00:28,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=126300.0, ans=0.0 2024-08-09 18:00:36,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=126300.0, ans=0.0 2024-08-09 18:00:41,724 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 18:00:55,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12650, loss[loss=0.1063, beats_loss=0.009215, ecapa_loss=0.0004418, whisper_loss=0.09267, over 14575.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01301, ecapa_loss=0.0003866, whisper_loss=0.1039, over 3861968.13 frames. ], batch size: 55, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:00:59,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=126500.0, ans=0.2 2024-08-09 18:01:03,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=126500.0, ans=0.125 2024-08-09 18:01:23,063 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 18:01:27,545 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 18:01:33,554 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 18:01:37,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=126700.0, ans=0.5 2024-08-09 18:01:38,524 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 18:01:52,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=126800.0, ans=0.0 2024-08-09 18:01:58,903 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 18:02:04,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=15.0 2024-08-09 18:02:08,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.930e+01 3.194e+01 3.853e+01 8.153e+01, threshold=6.388e+01, percent-clipped=1.0 2024-08-09 18:02:08,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12700, loss[loss=0.133, beats_loss=0.01268, ecapa_loss=0.0003778, whisper_loss=0.1165, over 15958.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01297, ecapa_loss=0.0003899, whisper_loss=0.1037, over 3846136.37 frames. ], batch size: 64, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:02:15,872 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-09 18:02:20,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=127000.0, ans=0.125 2024-08-09 18:02:39,607 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 18:02:54,198 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 18:02:58,247 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 18:03:09,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=127400.0, ans=0.125 2024-08-09 18:03:15,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=127400.0, ans=0.2 2024-08-09 18:03:20,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=127400.0, ans=0.125 2024-08-09 18:03:22,563 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12750, loss[loss=0.1269, beats_loss=0.01295, ecapa_loss=0.0004168, whisper_loss=0.1097, over 21917.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.013, ecapa_loss=0.0003887, whisper_loss=0.1035, over 3860487.57 frames. ], batch size: 88, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:03:26,053 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 18:03:35,051 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-09 18:03:59,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=127700.0, ans=15.0 2024-08-09 18:04:00,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=127700.0, ans=0.0 2024-08-09 18:04:26,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.81 vs. limit=10.0 2024-08-09 18:04:28,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=127900.0, ans=0.0 2024-08-09 18:04:30,837 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 18:04:33,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.049e+01 3.510e+01 3.985e+01 5.812e+01, threshold=7.020e+01, percent-clipped=0.0 2024-08-09 18:04:33,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12800, loss[loss=0.1415, beats_loss=0.01006, ecapa_loss=0.0004883, whisper_loss=0.1265, over 18168.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01295, ecapa_loss=0.0003886, whisper_loss=0.104, over 3840644.99 frames. ], batch size: 76, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:04:33,514 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 18:04:37,515 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 18:04:37,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=128000.0, ans=0.2 2024-08-09 18:04:37,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=128000.0, ans=0.125 2024-08-09 18:04:46,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128100.0, ans=0.1 2024-08-09 18:04:51,758 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 18:04:56,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=128100.0, ans=0.0 2024-08-09 18:04:57,016 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 18:05:01,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=128200.0, ans=0.025 2024-08-09 18:05:09,433 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 18:05:26,704 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 18:05:30,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=128400.0, ans=0.125 2024-08-09 18:05:35,601 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-09 18:05:38,664 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 18:05:44,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12850, loss[loss=0.1221, beats_loss=0.01382, ecapa_loss=0.0003039, whisper_loss=0.1053, over 20263.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01293, ecapa_loss=0.0003892, whisper_loss=0.1038, over 3823089.33 frames. ], batch size: 78, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:06:19,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2024-08-09 18:06:21,222 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 18:06:34,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.32 vs. limit=10.0 2024-08-09 18:06:36,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=128800.0, ans=0.125 2024-08-09 18:06:57,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.723e+01 3.295e+01 4.012e+01 6.106e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-09 18:06:57,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12900, loss[loss=0.1302, beats_loss=0.01202, ecapa_loss=0.0003674, whisper_loss=0.1145, over 22322.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01297, ecapa_loss=0.000389, whisper_loss=0.1036, over 3852097.80 frames. ], batch size: 89, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:07:09,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129000.0, ans=0.125 2024-08-09 18:07:11,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=129100.0, ans=0.125 2024-08-09 18:07:29,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=129200.0, ans=0.125 2024-08-09 18:07:44,879 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 19 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-09 18:07:53,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2024-08-09 18:07:55,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-09 18:07:59,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.98 vs. limit=15.0 2024-08-09 18:08:07,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=129500.0, ans=0.09899494936611666 2024-08-09 18:08:08,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 12950, loss[loss=0.1442, beats_loss=0.01102, ecapa_loss=0.000376, whisper_loss=0.1294, over 23137.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01296, ecapa_loss=0.0003885, whisper_loss=0.1033, over 3867598.40 frames. ], batch size: 90, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:08:30,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=129600.0, ans=0.0 2024-08-09 18:08:36,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=129700.0, ans=0.0 2024-08-09 18:08:36,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=129700.0, ans=0.125 2024-08-09 18:08:36,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129700.0, ans=0.1 2024-08-09 18:08:52,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=129800.0, ans=0.125 2024-08-09 18:08:54,904 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 18:09:00,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=129800.0, ans=0.0 2024-08-09 18:09:09,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=129900.0, ans=0.0 2024-08-09 18:09:24,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 3.005e+01 3.464e+01 3.958e+01 5.866e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 18:09:24,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13000, loss[loss=0.1337, beats_loss=0.01564, ecapa_loss=0.0003314, whisper_loss=0.1148, over 19862.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01298, ecapa_loss=0.0003858, whisper_loss=0.1036, over 3870908.64 frames. ], batch size: 80, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:09:35,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=130000.0, ans=0.1 2024-08-09 18:09:40,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=130100.0, ans=0.125 2024-08-09 18:09:42,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.57 vs. limit=22.5 2024-08-09 18:09:43,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.63 vs. limit=22.5 2024-08-09 18:09:52,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2024-08-09 18:09:57,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=130200.0, ans=0.125 2024-08-09 18:10:05,869 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 18:10:07,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=130300.0, ans=0.05 2024-08-09 18:10:08,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2024-08-09 18:10:27,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=130400.0, ans=0.125 2024-08-09 18:10:38,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13050, loss[loss=0.117, beats_loss=0.01396, ecapa_loss=0.000406, whisper_loss=0.099, over 20453.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01298, ecapa_loss=0.0003872, whisper_loss=0.1036, over 3860919.24 frames. ], batch size: 86, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:10:45,768 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 18:10:46,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=130500.0, ans=0.125 2024-08-09 18:10:56,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130600.0, ans=0.1 2024-08-09 18:11:01,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=130600.0, ans=0.125 2024-08-09 18:11:06,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=130600.0, ans=0.125 2024-08-09 18:11:16,568 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-09 18:11:22,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130700.0, ans=0.1 2024-08-09 18:11:28,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=130700.0, ans=0.125 2024-08-09 18:11:37,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.41 vs. limit=15.0 2024-08-09 18:11:41,183 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-09 18:11:43,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=130800.0, ans=0.2 2024-08-09 18:12:08,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.885e+01 3.590e+01 4.189e+01 8.103e+01, threshold=7.179e+01, percent-clipped=1.0 2024-08-09 18:12:08,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13100, loss[loss=0.108, beats_loss=0.01383, ecapa_loss=0.0003978, whisper_loss=0.0902, over 20074.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.013, ecapa_loss=0.0003862, whisper_loss=0.1036, over 3891472.66 frames. ], batch size: 84, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:12:12,899 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 18:12:21,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=131100.0, ans=0.2 2024-08-09 18:12:36,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=131200.0, ans=0.125 2024-08-09 18:12:43,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=12.0 2024-08-09 18:12:49,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.21 vs. limit=10.0 2024-08-09 18:13:03,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=15.0 2024-08-09 18:13:41,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13150, loss[loss=0.1084, beats_loss=0.01597, ecapa_loss=0.0003439, whisper_loss=0.08901, over 17340.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.0131, ecapa_loss=0.0003842, whisper_loss=0.103, over 3884600.71 frames. ], batch size: 72, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:13:47,324 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 18:14:03,563 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 18:14:12,447 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-09 18:14:16,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=131600.0, ans=0.125 2024-08-09 18:14:42,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131700.0, ans=0.1 2024-08-09 18:14:45,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=131800.0, ans=0.04949747468305833 2024-08-09 18:14:49,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=131800.0, ans=0.125 2024-08-09 18:14:54,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=131800.0, ans=0.04949747468305833 2024-08-09 18:14:59,465 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-09 18:15:07,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=131900.0, ans=0.0 2024-08-09 18:15:10,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=131900.0, ans=0.1 2024-08-09 18:15:31,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.963e+01 3.357e+01 4.080e+01 6.559e+01, threshold=6.714e+01, percent-clipped=0.0 2024-08-09 18:15:31,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13200, loss[loss=0.1141, beats_loss=0.01759, ecapa_loss=0.0003822, whisper_loss=0.09266, over 21304.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01312, ecapa_loss=0.0003815, whisper_loss=0.1035, over 3878621.79 frames. ], batch size: 88, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:15:47,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=132000.0, ans=0.0 2024-08-09 18:15:57,988 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-09 18:16:24,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=132200.0, ans=0.125 2024-08-09 18:16:35,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=132300.0, ans=0.0 2024-08-09 18:16:51,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=132300.0, ans=0.2 2024-08-09 18:17:15,219 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 18:17:16,240 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13250, loss[loss=0.1353, beats_loss=0.009516, ecapa_loss=0.0003493, whisper_loss=0.1223, over 15782.00 frames. ], tot_loss[loss=0.121, beats_loss=0.01304, ecapa_loss=0.0003844, whisper_loss=0.1041, over 3876460.06 frames. ], batch size: 58, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:17:24,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=132500.0, ans=0.2 2024-08-09 18:17:27,039 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 18:17:31,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132500.0, ans=0.1 2024-08-09 18:17:33,458 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 18:18:01,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2024-08-09 18:18:09,773 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 18:18:20,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=132800.0, ans=0.05 2024-08-09 18:18:20,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=132800.0, ans=0.05 2024-08-09 18:18:40,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 2.970e+01 3.375e+01 4.348e+01 9.574e+01, threshold=6.749e+01, percent-clipped=3.0 2024-08-09 18:18:40,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13300, loss[loss=0.09485, beats_loss=0.01328, ecapa_loss=0.0004123, whisper_loss=0.07745, over 19357.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01294, ecapa_loss=0.0003862, whisper_loss=0.1041, over 3862094.20 frames. ], batch size: 78, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:19:22,318 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-09 18:19:24,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=133300.0, ans=0.05 2024-08-09 18:19:27,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=133300.0, ans=0.5 2024-08-09 18:19:27,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=133300.0, ans=0.125 2024-08-09 18:19:44,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2024-08-09 18:19:44,965 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 16 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 18:19:50,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13350, loss[loss=0.1471, beats_loss=0.0115, ecapa_loss=0.0003627, whisper_loss=0.132, over 20109.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01302, ecapa_loss=0.000382, whisper_loss=0.1043, over 3860277.70 frames. ], batch size: 78, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:19:52,223 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 18:19:55,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-08-09 18:20:00,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.33 vs. limit=10.0 2024-08-09 18:20:08,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-08-09 18:20:08,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.29 vs. limit=22.5 2024-08-09 18:20:17,072 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 18:20:18,474 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 18:20:20,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=133700.0, ans=0.04949747468305833 2024-08-09 18:20:33,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2024-08-09 18:20:36,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-08-09 18:20:46,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=133800.0, ans=0.0 2024-08-09 18:20:57,474 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 18:20:59,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=133900.0, ans=0.2 2024-08-09 18:21:03,438 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 3.029e+01 3.343e+01 3.897e+01 6.977e+01, threshold=6.687e+01, percent-clipped=1.0 2024-08-09 18:21:03,465 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13400, loss[loss=0.1161, beats_loss=0.01489, ecapa_loss=0.0003055, whisper_loss=0.09812, over 22928.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01299, ecapa_loss=0.0003795, whisper_loss=0.1041, over 3852163.23 frames. ], batch size: 94, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:21:34,705 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 18:21:43,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134200.0, ans=0.1 2024-08-09 18:21:54,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=134300.0, ans=0.125 2024-08-09 18:21:54,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-08-09 18:22:05,320 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 18:22:13,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13450, loss[loss=0.1296, beats_loss=0.01357, ecapa_loss=0.0003268, whisper_loss=0.1128, over 17373.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01307, ecapa_loss=0.0003776, whisper_loss=0.1036, over 3887896.43 frames. ], batch size: 67, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:22:14,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2024-08-09 18:22:17,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=134500.0, ans=0.125 2024-08-09 18:22:24,897 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 18:22:33,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.15 vs. limit=15.0 2024-08-09 18:22:36,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=134600.0, ans=0.0 2024-08-09 18:22:48,848 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 18:23:04,209 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 18:23:23,124 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.856e+01 3.489e+01 4.024e+01 6.380e+01, threshold=6.978e+01, percent-clipped=0.0 2024-08-09 18:23:23,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13500, loss[loss=0.1187, beats_loss=0.01485, ecapa_loss=0.0003745, whisper_loss=0.1001, over 21787.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01308, ecapa_loss=0.0003794, whisper_loss=0.1037, over 3894977.60 frames. ], batch size: 90, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:23:25,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2024-08-09 18:23:36,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=135100.0, ans=0.125 2024-08-09 18:23:36,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-09 18:23:57,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=135200.0, ans=0.2 2024-08-09 18:24:13,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=15.0 2024-08-09 18:24:17,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=135300.0, ans=0.125 2024-08-09 18:24:34,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13550, loss[loss=0.1176, beats_loss=0.01066, ecapa_loss=0.000457, whisper_loss=0.1024, over 21374.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01304, ecapa_loss=0.0003811, whisper_loss=0.1031, over 3894160.60 frames. ], batch size: 90, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:24:39,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=135500.0, ans=0.125 2024-08-09 18:25:24,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.05 vs. limit=15.0 2024-08-09 18:25:27,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2024-08-09 18:25:35,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=135900.0, ans=0.5 2024-08-09 18:25:47,053 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 3.070e+01 3.576e+01 4.104e+01 5.875e+01, threshold=7.153e+01, percent-clipped=0.0 2024-08-09 18:25:47,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13600, loss[loss=0.1204, beats_loss=0.01313, ecapa_loss=0.0003506, whisper_loss=0.1037, over 20950.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01304, ecapa_loss=0.0003788, whisper_loss=0.1024, over 3902320.90 frames. ], batch size: 83, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:26:19,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=136200.0, ans=0.1 2024-08-09 18:26:36,579 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 18:26:36,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=136300.0, ans=0.1 2024-08-09 18:26:41,882 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 18:26:56,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=136400.0, ans=0.1 2024-08-09 18:26:58,632 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13650, loss[loss=0.1075, beats_loss=0.01492, ecapa_loss=0.0003761, whisper_loss=0.08878, over 14641.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01311, ecapa_loss=0.0003768, whisper_loss=0.1026, over 3908317.84 frames. ], batch size: 59, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:27:08,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=136500.0, ans=0.1 2024-08-09 18:27:22,895 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-09 18:27:23,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=136600.0, ans=0.2 2024-08-09 18:27:39,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=136700.0, ans=0.0 2024-08-09 18:27:53,105 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 18:27:57,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-08-09 18:27:59,736 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 18:28:01,273 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 18:28:09,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.873e+01 3.233e+01 3.836e+01 5.786e+01, threshold=6.466e+01, percent-clipped=0.0 2024-08-09 18:28:09,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13700, loss[loss=0.1055, beats_loss=0.01326, ecapa_loss=0.0003476, whisper_loss=0.08872, over 20806.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01315, ecapa_loss=0.0003758, whisper_loss=0.102, over 3893080.51 frames. ], batch size: 82, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:28:14,760 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.142e+00 2024-08-09 18:28:15,628 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 18:28:17,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=137000.0, ans=0.0 2024-08-09 18:28:19,847 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-09 18:28:36,285 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.179e+00 2024-08-09 18:28:37,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=137200.0, ans=0.125 2024-08-09 18:28:58,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=137300.0, ans=0.1 2024-08-09 18:29:20,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13750, loss[loss=0.1098, beats_loss=0.01376, ecapa_loss=0.0003355, whisper_loss=0.09271, over 19204.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01317, ecapa_loss=0.0003747, whisper_loss=0.1024, over 3891178.72 frames. ], batch size: 75, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:29:22,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=137500.0, ans=0.125 2024-08-09 18:29:43,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=137600.0, ans=0.2 2024-08-09 18:29:47,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=137700.0, ans=0.125 2024-08-09 18:29:54,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=137700.0, ans=0.125 2024-08-09 18:30:14,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=137900.0, ans=0.125 2024-08-09 18:30:28,782 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+01 2.986e+01 3.490e+01 4.118e+01 8.159e+01, threshold=6.980e+01, percent-clipped=6.0 2024-08-09 18:30:28,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13800, loss[loss=0.1208, beats_loss=0.01297, ecapa_loss=0.000356, whisper_loss=0.1043, over 18436.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.0131, ecapa_loss=0.000373, whisper_loss=0.1029, over 3888649.06 frames. ], batch size: 72, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:30:36,136 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 18:31:01,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-08-09 18:31:09,132 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 18:31:12,786 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 18:31:36,142 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 18:31:37,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13850, loss[loss=0.1241, beats_loss=0.01046, ecapa_loss=0.0003209, whisper_loss=0.1104, over 18424.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.01306, ecapa_loss=0.0003719, whisper_loss=0.103, over 3878161.70 frames. ], batch size: 72, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:31:42,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=138500.0, ans=0.125 2024-08-09 18:31:42,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=138500.0, ans=0.0 2024-08-09 18:31:54,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=138600.0, ans=0.125 2024-08-09 18:31:59,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=138600.0, ans=0.125 2024-08-09 18:32:02,442 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 18:32:02,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=138600.0, ans=0.5 2024-08-09 18:32:13,903 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:32:16,119 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 18:32:22,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=138800.0, ans=0.125 2024-08-09 18:32:49,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=139000.0, ans=0.0 2024-08-09 18:32:49,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.816e+01 3.337e+01 3.813e+01 6.629e+01, threshold=6.673e+01, percent-clipped=0.0 2024-08-09 18:32:49,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13900, loss[loss=0.1245, beats_loss=0.0147, ecapa_loss=0.0003519, whisper_loss=0.1063, over 22095.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01303, ecapa_loss=0.0003735, whisper_loss=0.1031, over 3896725.54 frames. ], batch size: 89, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:32:57,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=139000.0, ans=0.0 2024-08-09 18:33:21,861 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-09 18:33:34,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=139300.0, ans=0.2 2024-08-09 18:34:00,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 13950, loss[loss=0.1315, beats_loss=0.01398, ecapa_loss=0.0003628, whisper_loss=0.1139, over 22037.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.013, ecapa_loss=0.0003759, whisper_loss=0.1034, over 3912483.98 frames. ], batch size: 88, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:34:36,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.63 vs. limit=10.0 2024-08-09 18:34:52,512 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 18:35:02,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=139900.0, ans=0.1 2024-08-09 18:35:09,076 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.055e+01 3.459e+01 4.049e+01 5.260e+01, threshold=6.917e+01, percent-clipped=0.0 2024-08-09 18:35:09,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14000, loss[loss=0.09756, beats_loss=0.01444, ecapa_loss=0.0002866, whisper_loss=0.08025, over 15144.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01303, ecapa_loss=0.0003729, whisper_loss=0.1032, over 3916942.41 frames. ], batch size: 57, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:35:15,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140000.0, ans=0.1 2024-08-09 18:35:26,468 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 18:35:33,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=140100.0, ans=0.025 2024-08-09 18:35:41,750 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 18:35:51,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=140300.0, ans=0.125 2024-08-09 18:36:15,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2024-08-09 18:36:16,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2024-08-09 18:36:18,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14050, loss[loss=0.1211, beats_loss=0.01264, ecapa_loss=0.0004005, whisper_loss=0.1045, over 22315.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01296, ecapa_loss=0.0003728, whisper_loss=0.1038, over 3920118.34 frames. ], batch size: 87, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:36:23,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=140500.0, ans=0.2 2024-08-09 18:36:30,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140500.0, ans=0.1 2024-08-09 18:36:32,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=140600.0, ans=0.07 2024-08-09 18:36:40,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2024-08-09 18:36:52,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=140700.0, ans=0.125 2024-08-09 18:36:55,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=140700.0, ans=0.125 2024-08-09 18:37:05,033 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 18:37:17,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=140900.0, ans=0.1 2024-08-09 18:37:27,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.062e+01 3.430e+01 4.130e+01 6.899e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 18:37:27,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14100, loss[loss=0.1355, beats_loss=0.01151, ecapa_loss=0.0003958, whisper_loss=0.1201, over 15879.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01291, ecapa_loss=0.0003725, whisper_loss=0.1043, over 3928650.73 frames. ], batch size: 65, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:37:31,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141000.0, ans=0.1 2024-08-09 18:37:48,657 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 18:37:54,182 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-09 18:37:58,260 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-09 18:38:07,076 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 18:38:09,940 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-09 18:38:37,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14150, loss[loss=0.1546, beats_loss=0.009165, ecapa_loss=0.0004012, whisper_loss=0.1414, over 20992.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01292, ecapa_loss=0.0003695, whisper_loss=0.1049, over 3947129.02 frames. ], batch size: 79, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:38:37,644 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 18:38:46,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=141500.0, ans=0.025 2024-08-09 18:39:02,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=141600.0, ans=0.125 2024-08-09 18:39:04,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=141700.0, ans=0.0 2024-08-09 18:39:12,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=15.0 2024-08-09 18:39:12,812 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 18:39:16,266 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 18:39:21,427 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-09 18:39:25,851 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 18:39:30,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=141800.0, ans=0.0 2024-08-09 18:39:34,547 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 18:39:41,727 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 18:39:48,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.107e+01 3.530e+01 4.182e+01 6.705e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-09 18:39:48,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14200, loss[loss=0.115, beats_loss=0.01376, ecapa_loss=0.0003574, whisper_loss=0.09766, over 22160.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01308, ecapa_loss=0.0003662, whisper_loss=0.1034, over 3941267.76 frames. ], batch size: 90, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:40:17,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=142200.0, ans=10.0 2024-08-09 18:40:24,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-08-09 18:40:30,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=142200.0, ans=0.0 2024-08-09 18:40:32,182 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-09 18:40:35,332 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-09 18:40:45,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=142300.0, ans=0.125 2024-08-09 18:40:56,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=142400.0, ans=0.125 2024-08-09 18:41:04,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14250, loss[loss=0.1398, beats_loss=0.01037, ecapa_loss=0.0003903, whisper_loss=0.1255, over 19715.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01299, ecapa_loss=0.0003692, whisper_loss=0.1035, over 3917945.69 frames. ], batch size: 77, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:41:07,581 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 18:41:18,603 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 18:41:18,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=142600.0, ans=0.0 2024-08-09 18:41:24,473 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-09 18:41:48,564 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 18:42:11,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=142900.0, ans=0.125 2024-08-09 18:42:19,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.991e+01 3.300e+01 4.002e+01 6.725e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-09 18:42:19,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14300, loss[loss=0.1038, beats_loss=0.01322, ecapa_loss=0.0003693, whisper_loss=0.08691, over 21531.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01307, ecapa_loss=0.0003665, whisper_loss=0.1029, over 3915243.32 frames. ], batch size: 90, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:42:28,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=143000.0, ans=0.125 2024-08-09 18:42:37,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=143100.0, ans=0.125 2024-08-09 18:43:07,176 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-09 18:43:07,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143300.0, ans=0.1 2024-08-09 18:43:07,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-09 18:43:13,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143300.0, ans=0.125 2024-08-09 18:43:33,120 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14350, loss[loss=0.102, beats_loss=0.01202, ecapa_loss=0.0004225, whisper_loss=0.08573, over 18976.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01307, ecapa_loss=0.000366, whisper_loss=0.1028, over 3904355.27 frames. ], batch size: 80, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:43:47,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=143600.0, ans=0.0 2024-08-09 18:43:48,374 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:43:58,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=143600.0, ans=0.2 2024-08-09 18:44:26,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-08-09 18:44:44,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=143900.0, ans=0.125 2024-08-09 18:44:48,776 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.978e+01 3.379e+01 3.872e+01 1.013e+02, threshold=6.758e+01, percent-clipped=3.0 2024-08-09 18:44:48,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14400, loss[loss=0.1121, beats_loss=0.01412, ecapa_loss=0.000378, whisper_loss=0.09415, over 18908.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01303, ecapa_loss=0.0003697, whisper_loss=0.1025, over 3901106.26 frames. ], batch size: 74, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:44:56,245 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 18:45:05,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=144100.0, ans=0.125 2024-08-09 18:45:12,654 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 18:45:14,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2024-08-09 18:45:20,932 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 18:45:24,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=144200.0, ans=0.1 2024-08-09 18:45:34,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=144300.0, ans=0.0 2024-08-09 18:45:45,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=144400.0, ans=0.95 2024-08-09 18:45:53,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=144400.0, ans=0.0 2024-08-09 18:45:57,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2024-08-09 18:46:01,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 1, batch 14450, loss[loss=0.1085, beats_loss=0.01552, ecapa_loss=0.0003133, whisper_loss=0.08987, over 22082.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01305, ecapa_loss=0.0003699, whisper_loss=0.1027, over 3879963.01 frames. ], batch size: 89, lr: 3.05e-02, grad_scale: 4096.0 2024-08-09 18:46:06,275 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 18:46:07,557 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 18:46:25,913 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-09 18:46:26,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=144600.0, ans=0.1 2024-08-09 18:46:44,919 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 18:46:50,490 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-09 18:46:54,250 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-09 18:47:01,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=144900.0, ans=0.0 2024-08-09 18:47:50,197 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 18:47:51,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 0, loss[loss=0.08408, beats_loss=0.01555, ecapa_loss=0.0004696, whisper_loss=0.06384, over 14075.00 frames. ], tot_loss[loss=0.08408, beats_loss=0.01555, ecapa_loss=0.0004696, whisper_loss=0.06384, over 14075.00 frames. ], batch size: 59, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:47:51,906 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 18:48:33,895 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on ASR_libri: loss=0.287, beats_loss=0, ecapa_loss=0.001066, whisper_loss=0.2763, over 922467.00 frames. 2024-08-09 18:48:50,295 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on SV_voxceleb1: loss=0.009611, beats_loss=0, ecapa_loss=0.0009611, whisper_loss=0, over 939242.00 frames. 2024-08-09 18:50:53,519 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on AT_audioset: loss=0.0306, beats_loss=0.0306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 18:50:53,527 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 18:50:56,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.997e+01 3.426e+01 4.261e+01 6.161e+01, threshold=6.853e+01, percent-clipped=0.0 2024-08-09 18:51:36,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=145080.0, ans=0.035 2024-08-09 18:51:42,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=145080.0, ans=0.125 2024-08-09 18:51:45,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=145080.0, ans=0.125 2024-08-09 18:52:51,753 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 18:53:03,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 50, loss[loss=0.1071, beats_loss=0.01318, ecapa_loss=0.0003904, whisper_loss=0.09, over 20605.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01314, ecapa_loss=0.0003769, whisper_loss=0.1001, over 866685.75 frames. ], batch size: 85, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:53:14,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=145480.0, ans=0.2 2024-08-09 18:53:41,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2024-08-09 18:53:46,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=145580.0, ans=0.125 2024-08-09 18:53:54,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.98 vs. limit=22.5 2024-08-09 18:53:58,838 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 18:54:11,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=145680.0, ans=0.0 2024-08-09 18:54:35,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=145780.0, ans=0.2 2024-08-09 18:54:48,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=15.0 2024-08-09 18:54:51,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-08-09 18:55:03,632 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 100, loss[loss=0.1187, beats_loss=0.01094, ecapa_loss=0.0004166, whisper_loss=0.1036, over 14713.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0131, ecapa_loss=0.0003719, whisper_loss=0.1, over 1498963.91 frames. ], batch size: 59, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:55:07,827 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.227e+01 3.507e+01 4.114e+01 7.130e+01, threshold=7.014e+01, percent-clipped=1.0 2024-08-09 18:55:11,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-09 18:55:12,492 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 18:55:17,928 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:55:19,186 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-09 18:55:40,986 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-09 18:55:52,962 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-09 18:56:08,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2024-08-09 18:56:20,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=146280.0, ans=0.0 2024-08-09 18:56:24,298 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 18:56:39,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=146380.0, ans=0.2 2024-08-09 18:56:53,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 150, loss[loss=0.1327, beats_loss=0.01215, ecapa_loss=0.0003637, whisper_loss=0.1169, over 21517.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01309, ecapa_loss=0.0003642, whisper_loss=0.1021, over 2007405.05 frames. ], batch size: 84, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:57:10,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=146580.0, ans=0.0 2024-08-09 18:58:09,366 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 25 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-09 18:58:11,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=146880.0, ans=0.0 2024-08-09 18:58:20,535 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 200, loss[loss=0.1182, beats_loss=0.01618, ecapa_loss=0.00032, whisper_loss=0.09878, over 21470.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01291, ecapa_loss=0.0003609, whisper_loss=0.1029, over 2418432.56 frames. ], batch size: 87, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:58:23,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.970e+01 3.444e+01 4.293e+01 6.916e+01, threshold=6.888e+01, percent-clipped=0.0 2024-08-09 18:58:43,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=147080.0, ans=0.1 2024-08-09 18:58:44,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=147080.0, ans=0.0 2024-08-09 18:58:53,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=147180.0, ans=0.125 2024-08-09 18:59:10,136 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 18:59:12,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=147280.0, ans=0.125 2024-08-09 18:59:24,855 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 18:59:39,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 250, loss[loss=0.08622, beats_loss=0.01668, ecapa_loss=0.0002535, whisper_loss=0.067, over 14249.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01297, ecapa_loss=0.0003543, whisper_loss=0.102, over 2710470.68 frames. ], batch size: 59, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:59:51,383 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 19:00:09,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=147680.0, ans=0.125 2024-08-09 19:00:27,096 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 19:00:45,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=147880.0, ans=0.125 2024-08-09 19:00:51,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=147880.0, ans=0.0 2024-08-09 19:00:54,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 300, loss[loss=0.1327, beats_loss=0.01, ecapa_loss=0.0003724, whisper_loss=0.119, over 14416.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01281, ecapa_loss=0.0003556, whisper_loss=0.1026, over 2945022.63 frames. ], batch size: 56, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 19:00:57,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 3.134e+01 3.449e+01 4.098e+01 7.776e+01, threshold=6.897e+01, percent-clipped=1.0 2024-08-09 19:01:04,756 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 19:01:18,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=148080.0, ans=0.0 2024-08-09 19:01:22,587 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 19:01:39,851 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:01:44,881 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 19:01:56,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2024-08-09 19:01:59,643 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 19:02:02,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=148380.0, ans=0.125 2024-08-09 19:02:05,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=148380.0, ans=0.025 2024-08-09 19:02:08,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 350, loss[loss=0.08542, beats_loss=0.01659, ecapa_loss=0.0003278, whisper_loss=0.06555, over 17945.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01274, ecapa_loss=0.0003541, whisper_loss=0.1016, over 3111410.96 frames. ], batch size: 76, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:02:16,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=148480.0, ans=0.125 2024-08-09 19:02:24,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=148580.0, ans=0.0 2024-08-09 19:02:49,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148680.0, ans=0.1 2024-08-09 19:02:57,965 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 19:03:02,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=148780.0, ans=0.125 2024-08-09 19:03:15,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=148880.0, ans=0.2 2024-08-09 19:03:23,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 400, loss[loss=0.1037, beats_loss=0.01294, ecapa_loss=0.0003462, whisper_loss=0.08727, over 18502.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01266, ecapa_loss=0.0003537, whisper_loss=0.1025, over 3265668.74 frames. ], batch size: 74, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:03:25,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.813e+01 3.235e+01 3.879e+01 6.977e+01, threshold=6.469e+01, percent-clipped=1.0 2024-08-09 19:03:29,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=148980.0, ans=0.125 2024-08-09 19:03:33,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=148980.0, ans=0.0 2024-08-09 19:03:33,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2024-08-09 19:03:41,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-09 19:03:55,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=149180.0, ans=0.0 2024-08-09 19:04:03,903 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 19:04:04,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=149180.0, ans=15.0 2024-08-09 19:04:21,107 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-09 19:04:22,652 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 19:04:23,979 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 19:04:26,833 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 19:04:27,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149380.0, ans=0.0 2024-08-09 19:04:31,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=12.0 2024-08-09 19:04:38,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 450, loss[loss=0.1392, beats_loss=0.01249, ecapa_loss=0.0002612, whisper_loss=0.1241, over 17226.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01269, ecapa_loss=0.0003485, whisper_loss=0.1031, over 3394159.46 frames. ], batch size: 63, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:04:43,418 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 19:04:44,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=149480.0, ans=0.05 2024-08-09 19:04:56,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=149580.0, ans=0.125 2024-08-09 19:04:58,566 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-09 19:05:05,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=149580.0, ans=0.125 2024-08-09 19:05:11,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=149680.0, ans=0.0 2024-08-09 19:05:11,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=149680.0, ans=0.2 2024-08-09 19:05:16,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-08-09 19:05:51,302 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 19:05:53,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=149980.0, ans=0.125 2024-08-09 19:05:54,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 500, loss[loss=0.1121, beats_loss=0.01215, ecapa_loss=0.0003526, whisper_loss=0.09638, over 16867.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01264, ecapa_loss=0.0003462, whisper_loss=0.1024, over 3489848.53 frames. ], batch size: 71, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:05:57,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.962e+01 3.493e+01 4.226e+01 6.986e+01, threshold=6.987e+01, percent-clipped=1.0 2024-08-09 19:05:57,277 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 19:06:00,026 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 19:06:11,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-09 19:06:28,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150180.0, ans=0.1 2024-08-09 19:06:29,256 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-09 19:06:37,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=150180.0, ans=0.0 2024-08-09 19:06:39,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2024-08-09 19:06:54,137 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 19:06:54,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=150380.0, ans=0.125 2024-08-09 19:07:00,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=150380.0, ans=0.125 2024-08-09 19:07:03,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=150380.0, ans=0.125 2024-08-09 19:07:09,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-09 19:07:10,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 550, loss[loss=0.157, beats_loss=0.00855, ecapa_loss=0.0003841, whisper_loss=0.1447, over 19972.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01266, ecapa_loss=0.0003449, whisper_loss=0.1019, over 3611284.76 frames. ], batch size: 75, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:07:53,215 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 19:08:02,585 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 19:08:05,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=150780.0, ans=0.0 2024-08-09 19:08:07,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150780.0, ans=0.1 2024-08-09 19:08:07,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150780.0, ans=0.1 2024-08-09 19:08:26,059 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 600, loss[loss=0.09516, beats_loss=0.01258, ecapa_loss=0.0003468, whisper_loss=0.07912, over 16008.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01276, ecapa_loss=0.0003408, whisper_loss=0.1013, over 3674649.30 frames. ], batch size: 62, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:08:28,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.923e+01 3.308e+01 3.857e+01 5.897e+01, threshold=6.616e+01, percent-clipped=0.0 2024-08-09 19:08:29,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=10.0 2024-08-09 19:08:31,698 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 19:08:38,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-09 19:08:40,963 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 19:08:44,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=151080.0, ans=0.0 2024-08-09 19:08:47,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=151080.0, ans=0.05 2024-08-09 19:08:51,561 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 19:08:57,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=151180.0, ans=0.2 2024-08-09 19:09:00,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=151180.0, ans=12.0 2024-08-09 19:09:01,496 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 19:09:38,297 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.014e+03 2024-08-09 19:09:39,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2024-08-09 19:09:40,410 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 650, loss[loss=0.1272, beats_loss=0.01282, ecapa_loss=0.0003347, whisper_loss=0.1111, over 23629.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01283, ecapa_loss=0.0003398, whisper_loss=0.1014, over 3728397.40 frames. ], batch size: 92, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:09:40,513 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 19:09:52,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=151480.0, ans=0.125 2024-08-09 19:10:05,027 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 19:10:11,491 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2024-08-09 19:10:15,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-08-09 19:10:16,581 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-09 19:10:23,620 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 13 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 19:10:55,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 700, loss[loss=0.1146, beats_loss=0.01369, ecapa_loss=0.0003367, whisper_loss=0.09755, over 14658.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01273, ecapa_loss=0.0003397, whisper_loss=0.102, over 3739339.67 frames. ], batch size: 57, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:10:57,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.682e+01 3.217e+01 3.765e+01 7.105e+01, threshold=6.434e+01, percent-clipped=1.0 2024-08-09 19:10:58,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=151980.0, ans=0.09899494936611666 2024-08-09 19:11:17,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2024-08-09 19:11:24,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=15.0 2024-08-09 19:11:31,501 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 19:11:45,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=152280.0, ans=0.0 2024-08-09 19:12:00,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=152380.0, ans=0.0 2024-08-09 19:12:05,891 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 19:12:10,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 750, loss[loss=0.1148, beats_loss=0.01351, ecapa_loss=0.0002701, whisper_loss=0.09855, over 15587.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01271, ecapa_loss=0.0003401, whisper_loss=0.1024, over 3742548.47 frames. ], batch size: 63, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:12:25,792 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 19:13:01,602 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 19:13:05,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.60 vs. limit=10.0 2024-08-09 19:13:15,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2024-08-09 19:13:23,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=152880.0, ans=0.2 2024-08-09 19:13:26,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 800, loss[loss=0.1025, beats_loss=0.01437, ecapa_loss=0.0002991, whisper_loss=0.08517, over 23641.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01278, ecapa_loss=0.0003385, whisper_loss=0.1016, over 3762324.01 frames. ], batch size: 93, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:13:27,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=152980.0, ans=0.0 2024-08-09 19:13:30,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 2.796e+01 3.224e+01 3.871e+01 5.736e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-09 19:14:01,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=153180.0, ans=0.125 2024-08-09 19:14:19,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=153280.0, ans=0.125 2024-08-09 19:14:25,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=153280.0, ans=0.125 2024-08-09 19:14:43,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 850, loss[loss=0.1165, beats_loss=0.01156, ecapa_loss=0.0003522, whisper_loss=0.1014, over 15596.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01276, ecapa_loss=0.0003393, whisper_loss=0.1012, over 3779441.69 frames. ], batch size: 59, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:14:44,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-09 19:14:46,072 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 19:14:50,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=153480.0, ans=0.2 2024-08-09 19:14:51,860 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 19:15:02,704 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 15 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 19:15:06,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=153580.0, ans=0.125 2024-08-09 19:15:09,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=153580.0, ans=0.0 2024-08-09 19:15:18,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=153680.0, ans=0.015 2024-08-09 19:15:20,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153680.0, ans=0.1 2024-08-09 19:15:23,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=153680.0, ans=0.2 2024-08-09 19:16:02,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 900, loss[loss=0.105, beats_loss=0.0128, ecapa_loss=0.0002722, whisper_loss=0.0895, over 16401.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01272, ecapa_loss=0.0003397, whisper_loss=0.1021, over 3798059.17 frames. ], batch size: 58, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:16:05,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.273e+01 2.893e+01 3.249e+01 3.934e+01 7.637e+01, threshold=6.497e+01, percent-clipped=1.0 2024-08-09 19:16:15,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=153980.0, ans=0.0 2024-08-09 19:16:32,093 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 19:16:37,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154180.0, ans=0.125 2024-08-09 19:16:38,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=154180.0, ans=0.125 2024-08-09 19:16:49,216 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 19:16:57,639 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 19:17:11,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=154380.0, ans=0.0 2024-08-09 19:17:13,549 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 19:17:19,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 950, loss[loss=0.109, beats_loss=0.01351, ecapa_loss=0.0002782, whisper_loss=0.09269, over 24275.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01282, ecapa_loss=0.0003356, whisper_loss=0.1015, over 3795328.50 frames. ], batch size: 95, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:17:23,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-09 19:17:32,302 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:17:46,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=154580.0, ans=0.125 2024-08-09 19:18:10,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-09 19:18:18,327 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 19:18:22,984 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-09 19:18:23,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=154880.0, ans=0.5 2024-08-09 19:18:37,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=154980.0, ans=0.125 2024-08-09 19:18:37,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1000, loss[loss=0.1222, beats_loss=0.01449, ecapa_loss=0.0002968, whisper_loss=0.1047, over 20032.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01288, ecapa_loss=0.0003335, whisper_loss=0.101, over 3798255.49 frames. ], batch size: 78, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:18:41,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.941e+01 3.307e+01 3.877e+01 7.420e+01, threshold=6.613e+01, percent-clipped=2.0 2024-08-09 19:19:07,550 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 19:19:07,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=155080.0, ans=0.125 2024-08-09 19:19:12,450 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-09 19:19:41,784 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 19:19:43,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=155380.0, ans=10.0 2024-08-09 19:19:59,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1050, loss[loss=0.1235, beats_loss=0.01189, ecapa_loss=0.000282, whisper_loss=0.1088, over 19372.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01288, ecapa_loss=0.0003308, whisper_loss=0.1008, over 3813185.98 frames. ], batch size: 71, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:20:09,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=155480.0, ans=0.125 2024-08-09 19:20:10,926 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 19:20:43,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-09 19:20:43,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=155780.0, ans=0.125 2024-08-09 19:21:01,974 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 19:21:13,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1100, loss[loss=0.09063, beats_loss=0.01673, ecapa_loss=0.0002502, whisper_loss=0.0714, over 21931.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01278, ecapa_loss=0.0003311, whisper_loss=0.1015, over 3817497.33 frames. ], batch size: 87, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:21:17,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.935e+01 3.266e+01 4.117e+01 7.646e+01, threshold=6.532e+01, percent-clipped=3.0 2024-08-09 19:21:27,896 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-09 19:21:29,003 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 19:22:04,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=156280.0, ans=0.125 2024-08-09 19:22:07,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.24 vs. limit=22.5 2024-08-09 19:22:09,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=156380.0, ans=0.0 2024-08-09 19:22:18,937 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 19:22:24,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1150, loss[loss=0.09791, beats_loss=0.01485, ecapa_loss=0.0003906, whisper_loss=0.07916, over 15236.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01272, ecapa_loss=0.0003311, whisper_loss=0.1019, over 3823356.30 frames. ], batch size: 63, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:22:31,841 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-09 19:22:40,093 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 19:22:49,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156680.0, ans=0.1 2024-08-09 19:23:16,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=156880.0, ans=0.0 2024-08-09 19:23:30,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1200, loss[loss=0.1421, beats_loss=0.01242, ecapa_loss=0.0003068, whisper_loss=0.1266, over 21619.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01276, ecapa_loss=0.0003296, whisper_loss=0.1014, over 3841062.93 frames. ], batch size: 81, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:23:33,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.894e+01 3.270e+01 3.890e+01 7.018e+01, threshold=6.539e+01, percent-clipped=1.0 2024-08-09 19:23:35,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-08-09 19:23:46,929 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 19:23:59,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2024-08-09 19:24:00,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=157180.0, ans=0.125 2024-08-09 19:24:00,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=157180.0, ans=0.125 2024-08-09 19:24:04,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=157180.0, ans=0.2 2024-08-09 19:24:11,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=12.0 2024-08-09 19:24:11,766 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 19:24:17,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=157280.0, ans=0.0 2024-08-09 19:24:36,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1250, loss[loss=0.103, beats_loss=0.0122, ecapa_loss=0.0002924, whisper_loss=0.08785, over 16954.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01273, ecapa_loss=0.0003313, whisper_loss=0.1012, over 3829441.38 frames. ], batch size: 67, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:24:38,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-09 19:24:40,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=157480.0, ans=0.125 2024-08-09 19:24:42,766 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-09 19:24:47,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=157480.0, ans=0.125 2024-08-09 19:24:57,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2024-08-09 19:25:12,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157680.0, ans=0.0 2024-08-09 19:25:21,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=157780.0, ans=0.125 2024-08-09 19:25:22,194 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 19:25:32,989 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 19:25:33,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=157880.0, ans=0.125 2024-08-09 19:25:40,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=157980.0, ans=0.0 2024-08-09 19:25:41,634 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1300, loss[loss=0.1365, beats_loss=0.01013, ecapa_loss=0.0003666, whisper_loss=0.1227, over 23768.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01275, ecapa_loss=0.0003301, whisper_loss=0.1013, over 3807760.96 frames. ], batch size: 92, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:25:42,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157980.0, ans=0.1 2024-08-09 19:25:44,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.862e+01 3.141e+01 3.804e+01 7.057e+01, threshold=6.283e+01, percent-clipped=1.0 2024-08-09 19:26:01,370 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 19:26:11,967 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 19:26:25,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=158280.0, ans=0.0 2024-08-09 19:26:45,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=158380.0, ans=0.2 2024-08-09 19:26:47,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1350, loss[loss=0.114, beats_loss=0.01293, ecapa_loss=0.0002978, whisper_loss=0.09811, over 22139.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01276, ecapa_loss=0.0003316, whisper_loss=0.101, over 3807962.88 frames. ], batch size: 88, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:26:48,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-08-09 19:26:57,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158480.0, ans=0.125 2024-08-09 19:27:01,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=158580.0, ans=0.125 2024-08-09 19:27:37,097 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-09 19:27:39,686 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-09 19:27:45,989 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-09 19:27:53,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1400, loss[loss=0.1326, beats_loss=0.0118, ecapa_loss=0.0002426, whisper_loss=0.1184, over 18885.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01269, ecapa_loss=0.0003312, whisper_loss=0.1005, over 3790533.57 frames. ], batch size: 68, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:27:54,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=158980.0, ans=0.0 2024-08-09 19:27:56,774 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.826e+01 3.197e+01 3.856e+01 5.556e+01, threshold=6.395e+01, percent-clipped=0.0 2024-08-09 19:28:00,722 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 19:28:07,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=159080.0, ans=0.0 2024-08-09 19:28:10,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.28 vs. limit=22.5 2024-08-09 19:28:11,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=159080.0, ans=0.0 2024-08-09 19:28:11,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=159080.0, ans=0.2 2024-08-09 19:28:18,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=159080.0, ans=0.125 2024-08-09 19:28:30,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-09 19:29:00,240 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1450, loss[loss=0.1104, beats_loss=0.009785, ecapa_loss=0.0003772, whisper_loss=0.09687, over 16361.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01273, ecapa_loss=0.0003289, whisper_loss=0.1009, over 3814659.18 frames. ], batch size: 65, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:29:40,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=159580.0, ans=0.0 2024-08-09 19:29:44,458 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-09 19:29:49,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=159680.0, ans=0.125 2024-08-09 19:30:02,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159680.0, ans=0.1 2024-08-09 19:30:09,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=159780.0, ans=0.0 2024-08-09 19:30:09,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-09 19:30:18,001 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 19:30:22,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=159880.0, ans=0.125 2024-08-09 19:30:32,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159880.0, ans=0.1 2024-08-09 19:30:32,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=159880.0, ans=0.0 2024-08-09 19:30:34,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1500, loss[loss=0.1243, beats_loss=0.01225, ecapa_loss=0.0002979, whisper_loss=0.109, over 22040.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01279, ecapa_loss=0.0003295, whisper_loss=0.09984, over 3797274.23 frames. ], batch size: 84, lr: 2.87e-02, grad_scale: 4096.0 2024-08-09 19:30:39,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.965e+01 3.414e+01 4.022e+01 6.981e+01, threshold=6.828e+01, percent-clipped=1.0 2024-08-09 19:30:44,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=159980.0, ans=0.95 2024-08-09 19:30:54,489 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 19:31:00,109 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 19:31:09,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=160180.0, ans=0.025 2024-08-09 19:31:10,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=22.5 2024-08-09 19:31:15,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=160180.0, ans=0.125 2024-08-09 19:31:16,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.5 2024-08-09 19:31:31,429 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 19:31:37,897 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 19:31:38,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.40 vs. limit=10.0 2024-08-09 19:31:53,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=160480.0, ans=0.125 2024-08-09 19:31:54,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1550, loss[loss=0.1185, beats_loss=0.01471, ecapa_loss=0.0003177, whisper_loss=0.1006, over 20270.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01284, ecapa_loss=0.0003261, whisper_loss=0.1005, over 3838902.31 frames. ], batch size: 78, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:32:07,917 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 19:32:11,418 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 19:32:21,847 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 19:32:24,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=160680.0, ans=0.0 2024-08-09 19:32:27,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=160680.0, ans=0.015 2024-08-09 19:32:29,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-08-09 19:32:39,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=160680.0, ans=0.125 2024-08-09 19:32:46,193 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 19:33:02,787 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-09 19:33:07,775 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 23 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-09 19:33:09,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=160880.0, ans=0.0 2024-08-09 19:33:12,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1600, loss[loss=0.1229, beats_loss=0.01162, ecapa_loss=0.0003284, whisper_loss=0.108, over 20342.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01287, ecapa_loss=0.0003255, whisper_loss=0.09979, over 3848466.20 frames. ], batch size: 77, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:33:16,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.968e+01 3.450e+01 4.320e+01 7.036e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 19:33:34,987 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 19:33:35,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.22 vs. limit=15.0 2024-08-09 19:33:41,430 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 19:34:02,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2024-08-09 19:34:04,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=161280.0, ans=0.0 2024-08-09 19:34:20,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=161380.0, ans=0.0 2024-08-09 19:34:30,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1650, loss[loss=0.1286, beats_loss=0.01098, ecapa_loss=0.0003902, whisper_loss=0.1138, over 17669.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01281, ecapa_loss=0.0003256, whisper_loss=0.09995, over 3836410.56 frames. ], batch size: 69, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:34:30,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=161480.0, ans=0.125 2024-08-09 19:34:32,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.89 vs. limit=22.5 2024-08-09 19:34:37,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-09 19:34:45,176 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-09 19:34:56,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2024-08-09 19:34:57,671 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 19:35:05,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=161680.0, ans=0.1 2024-08-09 19:35:07,609 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-09 19:35:18,597 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 19:35:20,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=161780.0, ans=0.0 2024-08-09 19:35:26,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=161780.0, ans=0.125 2024-08-09 19:35:29,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-08-09 19:35:32,029 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 19:35:43,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=161880.0, ans=0.0 2024-08-09 19:35:45,571 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1700, loss[loss=0.1088, beats_loss=0.01342, ecapa_loss=0.0003552, whisper_loss=0.0918, over 18393.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01271, ecapa_loss=0.0003258, whisper_loss=0.1007, over 3831254.32 frames. ], batch size: 77, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:35:48,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.753e+01 3.153e+01 3.657e+01 6.641e+01, threshold=6.306e+01, percent-clipped=0.0 2024-08-09 19:35:58,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=161980.0, ans=0.1 2024-08-09 19:36:07,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.327e+03 2024-08-09 19:36:08,484 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 19:36:17,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.36 vs. limit=10.0 2024-08-09 19:36:24,307 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 19:36:30,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=162280.0, ans=0.125 2024-08-09 19:36:35,559 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 19:36:47,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2024-08-09 19:36:55,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=162380.0, ans=0.125 2024-08-09 19:36:59,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1750, loss[loss=0.1254, beats_loss=0.0126, ecapa_loss=0.0003395, whisper_loss=0.1095, over 21940.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01269, ecapa_loss=0.0003268, whisper_loss=0.1012, over 3850649.83 frames. ], batch size: 91, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:37:14,092 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 19:37:32,972 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 19:37:36,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=162680.0, ans=0.0 2024-08-09 19:37:36,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=162680.0, ans=0.125 2024-08-09 19:37:57,678 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 19:38:01,769 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 19:38:14,484 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-09 19:38:16,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1800, loss[loss=0.1021, beats_loss=0.01487, ecapa_loss=0.0003412, whisper_loss=0.08386, over 22355.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.0127, ecapa_loss=0.0003273, whisper_loss=0.1012, over 3811462.16 frames. ], batch size: 93, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:38:18,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.809e+01 3.330e+01 3.752e+01 6.796e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-09 19:38:19,110 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 19:38:25,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=162980.0, ans=0.125 2024-08-09 19:38:31,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=163080.0, ans=0.125 2024-08-09 19:38:39,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=163080.0, ans=0.125 2024-08-09 19:38:42,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=163080.0, ans=0.0 2024-08-09 19:38:46,282 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 19:39:13,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=163280.0, ans=0.2 2024-08-09 19:39:31,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1850, loss[loss=0.1013, beats_loss=0.01474, ecapa_loss=0.0003329, whisper_loss=0.08325, over 22777.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01282, ecapa_loss=0.0003289, whisper_loss=0.1003, over 3824188.67 frames. ], batch size: 95, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:39:33,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=163480.0, ans=0.125 2024-08-09 19:39:47,083 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 19:39:48,785 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 19:39:51,484 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-09 19:40:15,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=163780.0, ans=0.125 2024-08-09 19:40:15,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=163780.0, ans=0.125 2024-08-09 19:40:23,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=163780.0, ans=0.0 2024-08-09 19:40:27,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163880.0, ans=0.1 2024-08-09 19:40:31,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=163880.0, ans=0.0 2024-08-09 19:40:42,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1900, loss[loss=0.1273, beats_loss=0.01329, ecapa_loss=0.0003752, whisper_loss=0.1102, over 22921.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01277, ecapa_loss=0.0003363, whisper_loss=0.1002, over 3841954.02 frames. ], batch size: 90, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:40:43,043 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 19:40:45,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.888e+01 3.200e+01 3.675e+01 7.363e+01, threshold=6.401e+01, percent-clipped=1.0 2024-08-09 19:40:51,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-09 19:41:03,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-09 19:41:03,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2024-08-09 19:41:15,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-08-09 19:41:16,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=164180.0, ans=0.125 2024-08-09 19:41:28,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=164280.0, ans=0.2 2024-08-09 19:41:33,524 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 19:41:35,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=164380.0, ans=0.0 2024-08-09 19:41:47,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=164380.0, ans=0.2 2024-08-09 19:41:48,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164480.0, ans=0.1 2024-08-09 19:41:49,395 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 1950, loss[loss=0.1067, beats_loss=0.01035, ecapa_loss=0.0004576, whisper_loss=0.09174, over 17933.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01277, ecapa_loss=0.0003427, whisper_loss=0.1004, over 3829848.04 frames. ], batch size: 70, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:41:49,546 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 19:41:54,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-09 19:42:10,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=15.0 2024-08-09 19:42:18,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.95 vs. limit=15.0 2024-08-09 19:42:21,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=164680.0, ans=0.0 2024-08-09 19:42:24,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.10 vs. limit=10.0 2024-08-09 19:42:28,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=164780.0, ans=10.0 2024-08-09 19:42:37,685 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 19:42:45,286 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-09 19:42:55,713 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2000, loss[loss=0.08355, beats_loss=0.01561, ecapa_loss=0.0002469, whisper_loss=0.06547, over 14936.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01276, ecapa_loss=0.0003471, whisper_loss=0.1003, over 3805249.95 frames. ], batch size: 57, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:42:55,873 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-09 19:42:56,491 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2024-08-09 19:42:58,187 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.959e+01 3.174e+01 3.680e+01 5.777e+01, threshold=6.348e+01, percent-clipped=0.0 2024-08-09 19:43:26,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2024-08-09 19:43:36,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165280.0, ans=0.1 2024-08-09 19:43:48,152 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 19:43:58,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=165380.0, ans=0.0 2024-08-09 19:44:01,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2050, loss[loss=0.0732, beats_loss=0.01536, ecapa_loss=0.0003088, whisper_loss=0.05475, over 16377.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01275, ecapa_loss=0.0003506, whisper_loss=0.1002, over 3812413.65 frames. ], batch size: 67, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:44:26,567 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-09 19:44:27,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.59 vs. limit=22.5 2024-08-09 19:44:30,548 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 19:44:44,680 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 19:44:46,085 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 19:44:49,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.44 vs. limit=22.5 2024-08-09 19:45:03,221 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 19:45:05,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.67 vs. limit=15.0 2024-08-09 19:45:06,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2100, loss[loss=0.113, beats_loss=0.01304, ecapa_loss=0.0003032, whisper_loss=0.09693, over 17470.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01266, ecapa_loss=0.0003506, whisper_loss=0.101, over 3806882.15 frames. ], batch size: 65, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:45:09,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.923e+01 3.262e+01 4.036e+01 6.421e+01, threshold=6.525e+01, percent-clipped=1.0 2024-08-09 19:45:17,780 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 19:45:20,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=166080.0, ans=0.0 2024-08-09 19:45:26,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=166080.0, ans=0.0 2024-08-09 19:45:43,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=166180.0, ans=0.125 2024-08-09 19:45:50,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=166280.0, ans=0.0 2024-08-09 19:45:50,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=166280.0, ans=0.0 2024-08-09 19:45:55,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=166280.0, ans=0.125 2024-08-09 19:46:06,003 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 19:46:06,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=166380.0, ans=0.2 2024-08-09 19:46:12,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2150, loss[loss=0.1339, beats_loss=0.01131, ecapa_loss=0.0003169, whisper_loss=0.1194, over 16404.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01277, ecapa_loss=0.0003511, whisper_loss=0.1004, over 3770484.79 frames. ], batch size: 63, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:46:24,397 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 19:46:27,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=166580.0, ans=0.125 2024-08-09 19:46:37,399 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 19:46:49,169 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-09 19:47:00,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=166780.0, ans=0.0 2024-08-09 19:47:08,932 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 19:47:14,498 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 19:47:14,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=166880.0, ans=0.0 2024-08-09 19:47:18,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2200, loss[loss=0.1315, beats_loss=0.01331, ecapa_loss=0.0002635, whisper_loss=0.1155, over 19573.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01281, ecapa_loss=0.000353, whisper_loss=0.1009, over 3816176.30 frames. ], batch size: 71, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:47:21,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.890e+01 3.143e+01 3.810e+01 5.998e+01, threshold=6.286e+01, percent-clipped=0.0 2024-08-09 19:47:30,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=167080.0, ans=0.125 2024-08-09 19:47:46,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=15.0 2024-08-09 19:47:47,210 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 19:47:56,512 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 19:47:56,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167280.0, ans=0.0 2024-08-09 19:48:08,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=167280.0, ans=0.125 2024-08-09 19:48:14,170 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.541e+00 2024-08-09 19:48:20,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=167380.0, ans=0.2 2024-08-09 19:48:23,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2250, loss[loss=0.06165, beats_loss=0.01704, ecapa_loss=0.0003473, whisper_loss=0.04113, over 14855.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.0129, ecapa_loss=0.0003547, whisper_loss=0.1008, over 3811187.12 frames. ], batch size: 61, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:48:30,555 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:48:33,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=167480.0, ans=0.125 2024-08-09 19:48:42,079 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 19:48:44,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=167580.0, ans=0.125 2024-08-09 19:48:47,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=167580.0, ans=0.125 2024-08-09 19:48:51,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=167680.0, ans=0.125 2024-08-09 19:48:59,011 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 19:49:05,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=167780.0, ans=0.2 2024-08-09 19:49:28,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2300, loss[loss=0.1171, beats_loss=0.01149, ecapa_loss=0.0004773, whisper_loss=0.1009, over 18506.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01279, ecapa_loss=0.0003535, whisper_loss=0.1017, over 3831316.08 frames. ], batch size: 78, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:49:28,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=167980.0, ans=0.2 2024-08-09 19:49:28,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=167980.0, ans=0.125 2024-08-09 19:49:31,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 3.098e+01 3.355e+01 3.897e+01 6.798e+01, threshold=6.710e+01, percent-clipped=2.0 2024-08-09 19:49:55,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-08-09 19:49:56,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-09 19:49:59,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=12.0 2024-08-09 19:50:13,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=168280.0, ans=0.125 2024-08-09 19:50:34,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2350, loss[loss=0.1178, beats_loss=0.01198, ecapa_loss=0.0004152, whisper_loss=0.1016, over 17918.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01267, ecapa_loss=0.0003538, whisper_loss=0.1028, over 3832351.88 frames. ], batch size: 74, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:50:42,944 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 19:50:53,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=168580.0, ans=0.125 2024-08-09 19:51:11,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2024-08-09 19:51:21,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2024-08-09 19:51:33,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168880.0, ans=0.1 2024-08-09 19:51:34,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=168880.0, ans=0.0 2024-08-09 19:51:37,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=168880.0, ans=0.125 2024-08-09 19:51:40,761 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 19:51:42,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.55 vs. limit=10.0 2024-08-09 19:51:43,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2400, loss[loss=0.09647, beats_loss=0.009768, ecapa_loss=0.0003538, whisper_loss=0.08317, over 16137.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.0127, ecapa_loss=0.0003543, whisper_loss=0.103, over 3836507.79 frames. ], batch size: 64, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:51:46,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.941e+01 3.344e+01 3.819e+01 6.517e+01, threshold=6.689e+01, percent-clipped=0.0 2024-08-09 19:51:56,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=169080.0, ans=0.125 2024-08-09 19:51:59,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=169080.0, ans=0.125 2024-08-09 19:52:04,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169080.0, ans=0.1 2024-08-09 19:52:15,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=169180.0, ans=0.125 2024-08-09 19:52:32,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2024-08-09 19:52:36,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=169380.0, ans=0.0 2024-08-09 19:52:50,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=169480.0, ans=0.125 2024-08-09 19:52:50,783 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2450, loss[loss=0.1133, beats_loss=0.01329, ecapa_loss=0.0003681, whisper_loss=0.09629, over 21899.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01266, ecapa_loss=0.0003521, whisper_loss=0.1032, over 3824254.22 frames. ], batch size: 89, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:53:06,540 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 19:53:30,174 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 33 from Vox, 25 fro AS 2024-08-09 19:53:45,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=169880.0, ans=10.0 2024-08-09 19:54:00,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2500, loss[loss=0.122, beats_loss=0.01306, ecapa_loss=0.0003595, whisper_loss=0.1054, over 19372.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01267, ecapa_loss=0.0003528, whisper_loss=0.1028, over 3836421.60 frames. ], batch size: 80, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:54:03,061 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.848e+01 3.405e+01 3.928e+01 5.880e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-09 19:54:14,509 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:54:24,636 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 19:54:25,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=170080.0, ans=0.0 2024-08-09 19:54:36,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=170180.0, ans=0.0 2024-08-09 19:54:48,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=170280.0, ans=0.125 2024-08-09 19:54:49,721 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 19:55:08,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=170380.0, ans=0.2 2024-08-09 19:55:12,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2550, loss[loss=0.1339, beats_loss=0.009357, ecapa_loss=0.0004324, whisper_loss=0.1202, over 15960.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01256, ecapa_loss=0.0003531, whisper_loss=0.1029, over 3836601.30 frames. ], batch size: 62, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:55:16,008 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 19:55:20,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-08-09 19:55:32,062 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.250e-01 2024-08-09 19:55:54,853 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 19:56:08,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=170780.0, ans=0.0 2024-08-09 19:56:09,992 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 34 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-09 19:56:15,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=170880.0, ans=0.0 2024-08-09 19:56:17,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-09 19:56:26,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2600, loss[loss=0.1284, beats_loss=0.01162, ecapa_loss=0.0003106, whisper_loss=0.1137, over 20986.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01259, ecapa_loss=0.0003509, whisper_loss=0.1029, over 3841824.07 frames. ], batch size: 80, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:56:29,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.011e+01 3.512e+01 4.102e+01 7.361e+01, threshold=7.024e+01, percent-clipped=2.0 2024-08-09 19:56:42,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=171080.0, ans=0.0 2024-08-09 19:56:43,412 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 19:56:46,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=171080.0, ans=0.95 2024-08-09 19:56:47,525 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 19:56:59,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2024-08-09 19:57:00,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=171180.0, ans=0.0 2024-08-09 19:57:02,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=171180.0, ans=0.125 2024-08-09 19:57:11,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=171280.0, ans=0.125 2024-08-09 19:57:18,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171280.0, ans=0.125 2024-08-09 19:57:27,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=171380.0, ans=0.1 2024-08-09 19:57:36,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2650, loss[loss=0.0857, beats_loss=0.01444, ecapa_loss=0.0003109, whisper_loss=0.06816, over 16172.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01271, ecapa_loss=0.0003521, whisper_loss=0.1019, over 3839402.32 frames. ], batch size: 64, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:57:47,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=171480.0, ans=0.125 2024-08-09 19:57:56,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=171580.0, ans=0.07 2024-08-09 19:57:58,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=15.0 2024-08-09 19:58:21,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=171780.0, ans=0.0 2024-08-09 19:58:35,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2024-08-09 19:58:38,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=171880.0, ans=0.0 2024-08-09 19:58:48,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2700, loss[loss=0.1126, beats_loss=0.01551, ecapa_loss=0.0002948, whisper_loss=0.09416, over 18154.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01275, ecapa_loss=0.0003528, whisper_loss=0.1018, over 3851933.44 frames. ], batch size: 71, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:58:51,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.909e+01 3.335e+01 3.725e+01 7.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-09 19:59:06,425 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 19:59:22,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=172180.0, ans=0.0 2024-08-09 19:59:48,128 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 19:59:50,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=172380.0, ans=0.0 2024-08-09 19:59:55,305 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 19:59:59,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2750, loss[loss=0.1093, beats_loss=0.01024, ecapa_loss=0.0004001, whisper_loss=0.09505, over 16447.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01267, ecapa_loss=0.0003538, whisper_loss=0.102, over 3807435.60 frames. ], batch size: 64, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 20:00:28,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=172680.0, ans=0.0 2024-08-09 20:00:29,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172680.0, ans=0.1 2024-08-09 20:00:32,119 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 20:00:33,517 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 20:00:43,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=172780.0, ans=0.2 2024-08-09 20:00:44,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=172780.0, ans=0.125 2024-08-09 20:00:58,702 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.561e+00 2024-08-09 20:01:04,018 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-09 20:01:08,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=172880.0, ans=0.125 2024-08-09 20:01:12,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2800, loss[loss=0.1176, beats_loss=0.01273, ecapa_loss=0.0004634, whisper_loss=0.1003, over 20408.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01281, ecapa_loss=0.0003528, whisper_loss=0.1019, over 3823122.73 frames. ], batch size: 87, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:01:15,186 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 3.001e+01 3.485e+01 3.958e+01 7.033e+01, threshold=6.969e+01, percent-clipped=2.0 2024-08-09 20:01:17,079 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-09 20:01:38,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173080.0, ans=0.1 2024-08-09 20:01:45,635 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 20:01:45,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=173180.0, ans=0.04949747468305833 2024-08-09 20:01:51,205 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 20:01:59,651 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 20:02:04,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=173280.0, ans=0.0 2024-08-09 20:02:06,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=173280.0, ans=0.0 2024-08-09 20:02:16,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=173380.0, ans=0.2 2024-08-09 20:02:18,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=173380.0, ans=0.125 2024-08-09 20:02:24,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2850, loss[loss=0.09197, beats_loss=0.0143, ecapa_loss=0.0003498, whisper_loss=0.07417, over 16989.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01286, ecapa_loss=0.0003506, whisper_loss=0.1018, over 3812207.38 frames. ], batch size: 71, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:02:58,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=173680.0, ans=0.125 2024-08-09 20:03:36,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2900, loss[loss=0.1066, beats_loss=0.01263, ecapa_loss=0.0003948, whisper_loss=0.09007, over 13416.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01281, ecapa_loss=0.0003529, whisper_loss=0.1028, over 3837257.40 frames. ], batch size: 55, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:03:40,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.065e+01 3.431e+01 3.879e+01 6.098e+01, threshold=6.862e+01, percent-clipped=0.0 2024-08-09 20:03:42,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=173980.0, ans=0.07 2024-08-09 20:03:56,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174080.0, ans=0.1 2024-08-09 20:04:00,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=174080.0, ans=0.0 2024-08-09 20:04:02,147 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-09 20:04:06,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=174180.0, ans=0.0 2024-08-09 20:04:12,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=174180.0, ans=0.125 2024-08-09 20:04:13,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=174180.0, ans=15.0 2024-08-09 20:04:16,065 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 20:04:28,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174280.0, ans=0.1 2024-08-09 20:04:47,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 2950, loss[loss=0.1257, beats_loss=0.01041, ecapa_loss=0.000404, whisper_loss=0.1113, over 18632.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01278, ecapa_loss=0.0003526, whisper_loss=0.1023, over 3845804.52 frames. ], batch size: 78, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:04:54,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=174480.0, ans=0.0 2024-08-09 20:04:56,816 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 20:05:02,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=174580.0, ans=0.0 2024-08-09 20:05:23,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-09 20:06:14,465 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3000, loss[loss=0.104, beats_loss=0.0149, ecapa_loss=0.0003163, whisper_loss=0.0859, over 20997.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01286, ecapa_loss=0.0003516, whisper_loss=0.1018, over 3872793.73 frames. ], batch size: 86, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:06:14,465 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 20:06:58,517 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on ASR_libri: loss=0.2837, beats_loss=0, ecapa_loss=0.001014, whisper_loss=0.2736, over 922467.00 frames. 2024-08-09 20:07:17,262 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on SV_voxceleb1: loss=0.009278, beats_loss=0, ecapa_loss=0.0009278, whisper_loss=0, over 939242.00 frames. 2024-08-09 20:08:29,941 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3886, 1.5688, 1.3926, 1.3239, 1.3054, 1.4160, 1.6082, 1.6569], device='cuda:3') 2024-08-09 20:08:46,921 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1333, 4.9954, 4.7860, 5.0983], device='cuda:3') 2024-08-09 20:08:50,881 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on AT_audioset: loss=0.03024, beats_loss=0.03024, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 20:08:50,885 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 20:08:53,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.977e+01 3.430e+01 4.027e+01 7.550e+01, threshold=6.860e+01, percent-clipped=3.0 2024-08-09 20:08:55,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=174980.0, ans=0.2 2024-08-09 20:08:57,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=174980.0, ans=0.0 2024-08-09 20:09:04,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.00 vs. limit=10.0 2024-08-09 20:09:05,485 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 20:09:07,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=12.0 2024-08-09 20:09:12,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=175080.0, ans=0.2 2024-08-09 20:09:15,918 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 20:09:28,209 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 20:09:40,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-09 20:09:44,186 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 20:09:54,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2024-08-09 20:10:06,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=175380.0, ans=22.5 2024-08-09 20:10:07,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175380.0, ans=0.1 2024-08-09 20:10:28,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3050, loss[loss=0.1138, beats_loss=0.01481, ecapa_loss=0.0002814, whisper_loss=0.09615, over 23810.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01288, ecapa_loss=0.0003497, whisper_loss=0.1027, over 3901019.22 frames. ], batch size: 90, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:10:29,658 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 20:10:37,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2024-08-09 20:10:39,629 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 20:10:39,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=175480.0, ans=0.0 2024-08-09 20:12:21,855 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3100, loss[loss=0.1295, beats_loss=0.01039, ecapa_loss=0.0003372, whisper_loss=0.1157, over 15055.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01283, ecapa_loss=0.0003503, whisper_loss=0.103, over 3899042.10 frames. ], batch size: 56, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:12:24,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=175980.0, ans=0.0 2024-08-09 20:12:25,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.112e+01 3.600e+01 4.119e+01 8.540e+01, threshold=7.200e+01, percent-clipped=4.0 2024-08-09 20:13:09,017 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 20:13:11,431 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.084e+00 2024-08-09 20:13:37,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=176280.0, ans=0.125 2024-08-09 20:13:41,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=176280.0, ans=0.95 2024-08-09 20:13:44,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-09 20:14:08,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3150, loss[loss=0.1293, beats_loss=0.0135, ecapa_loss=0.0003186, whisper_loss=0.1126, over 21972.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01278, ecapa_loss=0.0003507, whisper_loss=0.1033, over 3885826.65 frames. ], batch size: 86, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:14:20,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=176480.0, ans=0.125 2024-08-09 20:14:41,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=176580.0, ans=0.0 2024-08-09 20:15:36,773 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 39 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 20:15:38,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=176880.0, ans=0.0 2024-08-09 20:15:42,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=176880.0, ans=0.125 2024-08-09 20:15:46,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3200, loss[loss=0.1348, beats_loss=0.01439, ecapa_loss=0.0002939, whisper_loss=0.1175, over 21814.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01276, ecapa_loss=0.000349, whisper_loss=0.1038, over 3869219.59 frames. ], batch size: 84, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:15:49,062 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.848e+01 3.292e+01 3.822e+01 6.429e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-09 20:16:00,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=177080.0, ans=0.125 2024-08-09 20:16:39,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=177280.0, ans=0.125 2024-08-09 20:17:00,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3250, loss[loss=0.1045, beats_loss=0.01419, ecapa_loss=0.0003162, whisper_loss=0.08714, over 13564.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01274, ecapa_loss=0.0003492, whisper_loss=0.104, over 3892265.98 frames. ], batch size: 54, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:17:10,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=177480.0, ans=0.125 2024-08-09 20:17:11,797 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 20:17:17,530 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 20:17:21,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=177580.0, ans=0.125 2024-08-09 20:17:33,931 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 20:17:41,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=177680.0, ans=0.0 2024-08-09 20:17:42,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177780.0, ans=0.1 2024-08-09 20:17:54,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-08-09 20:18:00,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=177880.0, ans=0.95 2024-08-09 20:18:02,966 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-09 20:18:07,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=177880.0, ans=0.0 2024-08-09 20:18:11,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=177880.0, ans=0.5 2024-08-09 20:18:14,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3300, loss[loss=0.09898, beats_loss=0.01601, ecapa_loss=0.0003497, whisper_loss=0.07947, over 21054.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01279, ecapa_loss=0.0003484, whisper_loss=0.1037, over 3872372.16 frames. ], batch size: 91, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:18:18,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.080e+01 3.504e+01 4.263e+01 7.840e+01, threshold=7.009e+01, percent-clipped=4.0 2024-08-09 20:18:30,081 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 20:18:34,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=178080.0, ans=0.125 2024-08-09 20:18:58,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-08-09 20:19:16,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.94 vs. limit=22.5 2024-08-09 20:19:36,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3350, loss[loss=0.1157, beats_loss=0.01191, ecapa_loss=0.0003681, whisper_loss=0.1001, over 17917.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01277, ecapa_loss=0.0003469, whisper_loss=0.1036, over 3888901.77 frames. ], batch size: 71, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:19:45,128 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-09 20:19:46,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178480.0, ans=0.1 2024-08-09 20:19:59,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=178580.0, ans=0.0 2024-08-09 20:20:20,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=178680.0, ans=0.125 2024-08-09 20:20:39,251 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 20:20:43,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2024-08-09 20:20:43,866 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 20:20:58,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3400, loss[loss=0.08923, beats_loss=0.01397, ecapa_loss=0.0002952, whisper_loss=0.07231, over 14998.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01278, ecapa_loss=0.0003477, whisper_loss=0.1035, over 3909225.96 frames. ], batch size: 58, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:21:00,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.994e+01 3.327e+01 4.294e+01 6.950e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 20:21:00,899 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 20:21:51,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=179280.0, ans=0.125 2024-08-09 20:22:08,501 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 20:22:08,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=179380.0, ans=0.0 2024-08-09 20:22:10,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=179380.0, ans=0.07 2024-08-09 20:22:21,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3450, loss[loss=0.1024, beats_loss=0.01475, ecapa_loss=0.0003138, whisper_loss=0.08452, over 14740.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.0128, ecapa_loss=0.0003479, whisper_loss=0.1029, over 3916913.27 frames. ], batch size: 59, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:22:25,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=12.0 2024-08-09 20:22:46,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179580.0, ans=0.1 2024-08-09 20:23:09,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=179780.0, ans=0.2 2024-08-09 20:23:16,064 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-09 20:23:21,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179780.0, ans=0.1 2024-08-09 20:23:27,953 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 20:23:36,860 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-09 20:23:43,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3500, loss[loss=0.1204, beats_loss=0.01165, ecapa_loss=0.0003743, whisper_loss=0.105, over 18923.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01281, ecapa_loss=0.0003481, whisper_loss=0.1024, over 3906206.15 frames. ], batch size: 76, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:23:47,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.966e+01 3.324e+01 3.987e+01 6.193e+01, threshold=6.648e+01, percent-clipped=0.0 2024-08-09 20:23:51,514 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-09 20:23:51,788 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:23:59,982 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-09 20:24:05,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=180080.0, ans=0.0 2024-08-09 20:24:21,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180180.0, ans=0.1 2024-08-09 20:24:42,040 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 20:25:00,494 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 20:25:08,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3550, loss[loss=0.1128, beats_loss=0.01392, ecapa_loss=0.0003767, whisper_loss=0.09514, over 18867.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01284, ecapa_loss=0.0003473, whisper_loss=0.1019, over 3906308.92 frames. ], batch size: 81, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:25:18,473 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 20:25:18,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-09 20:25:35,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=180580.0, ans=0.125 2024-08-09 20:25:43,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=180680.0, ans=0.0 2024-08-09 20:26:07,104 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 20:26:22,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=180880.0, ans=0.0 2024-08-09 20:26:26,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.39 vs. limit=10.0 2024-08-09 20:26:34,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=180980.0, ans=0.125 2024-08-09 20:26:35,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3600, loss[loss=0.1403, beats_loss=0.01064, ecapa_loss=0.000384, whisper_loss=0.1258, over 21963.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.0128, ecapa_loss=0.0003473, whisper_loss=0.1019, over 3904165.47 frames. ], batch size: 86, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:26:38,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.970e+01 3.508e+01 4.140e+01 6.583e+01, threshold=7.015e+01, percent-clipped=0.0 2024-08-09 20:26:45,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=180980.0, ans=0.025 2024-08-09 20:27:02,600 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 20:27:40,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-08-09 20:27:56,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3650, loss[loss=0.1224, beats_loss=0.01397, ecapa_loss=0.0002992, whisper_loss=0.1055, over 18683.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01286, ecapa_loss=0.0003454, whisper_loss=0.1017, over 3891844.12 frames. ], batch size: 75, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:28:15,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181580.0, ans=0.1 2024-08-09 20:28:31,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=181680.0, ans=0.02 2024-08-09 20:28:43,073 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 20:28:52,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2024-08-09 20:29:08,221 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-09 20:29:11,478 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 20:29:19,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3700, loss[loss=0.1288, beats_loss=0.01177, ecapa_loss=0.0003594, whisper_loss=0.1135, over 17891.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01278, ecapa_loss=0.0003458, whisper_loss=0.1018, over 3874251.81 frames. ], batch size: 71, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:29:21,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=181980.0, ans=0.07 2024-08-09 20:29:22,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.937e+01 3.354e+01 4.017e+01 7.791e+01, threshold=6.707e+01, percent-clipped=1.0 2024-08-09 20:29:37,064 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 20:29:40,435 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 20:29:59,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.73 vs. limit=10.0 2024-08-09 20:30:16,180 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 20:30:39,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3750, loss[loss=0.1162, beats_loss=0.01486, ecapa_loss=0.0003299, whisper_loss=0.09804, over 22279.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01281, ecapa_loss=0.0003498, whisper_loss=0.1016, over 3875941.41 frames. ], batch size: 93, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:30:44,324 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 20:31:08,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2024-08-09 20:31:16,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.25 vs. limit=6.0 2024-08-09 20:31:25,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=182680.0, ans=0.09899494936611666 2024-08-09 20:31:33,043 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 20:31:51,915 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.955e-02 2024-08-09 20:31:59,363 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3800, loss[loss=0.119, beats_loss=0.01232, ecapa_loss=0.0003703, whisper_loss=0.103, over 16633.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01287, ecapa_loss=0.0003515, whisper_loss=0.1015, over 3859414.55 frames. ], batch size: 64, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:31:59,546 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-09 20:32:01,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.977e+01 3.395e+01 3.964e+01 6.825e+01, threshold=6.789e+01, percent-clipped=1.0 2024-08-09 20:32:02,858 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-09 20:32:07,366 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 20:32:18,017 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.243e+02 2024-08-09 20:32:21,622 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 20:32:34,830 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 20:32:53,326 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-09 20:32:55,755 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 20:33:16,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3850, loss[loss=0.1209, beats_loss=0.01383, ecapa_loss=0.0003104, whisper_loss=0.1039, over 20555.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01288, ecapa_loss=0.0003497, whisper_loss=0.1012, over 3814485.54 frames. ], batch size: 81, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:33:19,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=183480.0, ans=0.2 2024-08-09 20:33:27,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-09 20:33:29,819 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 20:33:43,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=183580.0, ans=0.2 2024-08-09 20:33:57,218 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-09 20:34:00,736 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 20:34:10,911 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 20:34:25,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=183880.0, ans=0.125 2024-08-09 20:34:32,312 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 20:34:35,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3900, loss[loss=0.1399, beats_loss=0.01025, ecapa_loss=0.0004284, whisper_loss=0.1254, over 21728.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01281, ecapa_loss=0.0003506, whisper_loss=0.102, over 3845651.26 frames. ], batch size: 91, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:34:38,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.932e+01 3.278e+01 3.846e+01 7.989e+01, threshold=6.556e+01, percent-clipped=2.0 2024-08-09 20:34:42,631 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-09 20:34:46,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=183980.0, ans=0.0 2024-08-09 20:34:57,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2024-08-09 20:34:57,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-08-09 20:35:15,289 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 20:35:22,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=184280.0, ans=0.1 2024-08-09 20:35:25,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=184280.0, ans=0.0 2024-08-09 20:35:31,667 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 20:35:33,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=184280.0, ans=0.125 2024-08-09 20:35:56,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 3950, loss[loss=0.1221, beats_loss=0.01487, ecapa_loss=0.0002808, whisper_loss=0.1045, over 23268.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01285, ecapa_loss=0.0003474, whisper_loss=0.102, over 3878080.02 frames. ], batch size: 91, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:36:01,076 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 20:36:04,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=184480.0, ans=0.125 2024-08-09 20:36:26,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=184580.0, ans=0.125 2024-08-09 20:36:43,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=184780.0, ans=0.2 2024-08-09 20:36:57,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=184880.0, ans=0.1 2024-08-09 20:36:59,265 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 20:37:11,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=184880.0, ans=0.125 2024-08-09 20:37:14,772 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4000, loss[loss=0.08313, beats_loss=0.01622, ecapa_loss=0.0003038, whisper_loss=0.06388, over 14120.00 frames. ], tot_loss[loss=0.118, beats_loss=0.0129, ecapa_loss=0.000345, whisper_loss=0.1016, over 3877792.43 frames. ], batch size: 58, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:37:15,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=184980.0, ans=0.0 2024-08-09 20:37:17,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+01 2.965e+01 3.379e+01 3.827e+01 6.548e+01, threshold=6.758e+01, percent-clipped=0.0 2024-08-09 20:37:26,000 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 20:37:37,633 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 20:37:40,534 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 20:37:45,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185180.0, ans=0.1 2024-08-09 20:37:49,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=185180.0, ans=0.125 2024-08-09 20:38:01,648 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-09 20:38:03,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=185280.0, ans=0.125 2024-08-09 20:38:10,419 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 20:38:14,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=185380.0, ans=0.0 2024-08-09 20:38:20,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=185380.0, ans=0.0 2024-08-09 20:38:25,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=185380.0, ans=0.0 2024-08-09 20:38:30,370 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4050, loss[loss=0.1197, beats_loss=0.01264, ecapa_loss=0.000382, whisper_loss=0.1032, over 22797.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01284, ecapa_loss=0.000347, whisper_loss=0.1022, over 3880172.49 frames. ], batch size: 92, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:38:46,284 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-09 20:38:47,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=185580.0, ans=0.0 2024-08-09 20:38:56,938 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 20:39:03,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=185680.0, ans=0.125 2024-08-09 20:39:10,080 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 20:39:20,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=185780.0, ans=0.07 2024-08-09 20:39:34,440 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 20:39:39,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4100, loss[loss=0.1326, beats_loss=0.00944, ecapa_loss=0.0004341, whisper_loss=0.1188, over 16593.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01268, ecapa_loss=0.0003497, whisper_loss=0.1026, over 3859529.06 frames. ], batch size: 66, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:39:42,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 3.015e+01 3.336e+01 4.132e+01 1.372e+02, threshold=6.672e+01, percent-clipped=1.0 2024-08-09 20:39:43,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185980.0, ans=0.1 2024-08-09 20:40:17,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2024-08-09 20:40:24,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=186280.0, ans=0.125 2024-08-09 20:40:27,732 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 20:40:45,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4150, loss[loss=0.1371, beats_loss=0.01218, ecapa_loss=0.0003124, whisper_loss=0.1218, over 22421.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01271, ecapa_loss=0.0003514, whisper_loss=0.1021, over 3850342.41 frames. ], batch size: 87, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:40:57,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=186480.0, ans=0.125 2024-08-09 20:41:15,518 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 20:41:26,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=186780.0, ans=0.125 2024-08-09 20:41:38,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=186880.0, ans=0.0 2024-08-09 20:41:52,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4200, loss[loss=0.1217, beats_loss=0.0146, ecapa_loss=0.0003659, whisper_loss=0.1034, over 17136.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01284, ecapa_loss=0.000349, whisper_loss=0.1016, over 3873192.92 frames. ], batch size: 69, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:41:54,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.958e+01 3.347e+01 3.898e+01 6.800e+01, threshold=6.694e+01, percent-clipped=1.0 2024-08-09 20:42:07,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=187080.0, ans=0.2 2024-08-09 20:42:10,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=187080.0, ans=0.125 2024-08-09 20:42:20,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=187180.0, ans=0.1 2024-08-09 20:42:20,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-09 20:42:22,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=187180.0, ans=0.125 2024-08-09 20:42:33,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=187280.0, ans=0.1 2024-08-09 20:42:37,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=187280.0, ans=0.2 2024-08-09 20:42:44,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=187380.0, ans=0.0 2024-08-09 20:42:58,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4250, loss[loss=0.1053, beats_loss=0.01286, ecapa_loss=0.0003401, whisper_loss=0.08909, over 14170.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01283, ecapa_loss=0.0003455, whisper_loss=0.1012, over 3867225.49 frames. ], batch size: 54, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:43:03,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=187480.0, ans=0.125 2024-08-09 20:43:12,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=187580.0, ans=0.125 2024-08-09 20:43:16,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=187580.0, ans=0.125 2024-08-09 20:43:18,944 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 17 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-09 20:43:20,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=187580.0, ans=0.0 2024-08-09 20:43:29,743 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 20:43:33,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=187680.0, ans=0.2 2024-08-09 20:44:03,843 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4300, loss[loss=0.1252, beats_loss=0.0143, ecapa_loss=0.0002854, whisper_loss=0.108, over 24157.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01284, ecapa_loss=0.0003482, whisper_loss=0.1006, over 3855794.60 frames. ], batch size: 94, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:44:06,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.942e+01 3.508e+01 4.302e+01 6.032e+01, threshold=7.016e+01, percent-clipped=0.0 2024-08-09 20:44:07,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=187980.0, ans=0.125 2024-08-09 20:44:33,285 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-09 20:44:33,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=188180.0, ans=0.2 2024-08-09 20:44:50,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=188280.0, ans=0.125 2024-08-09 20:45:03,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=188380.0, ans=0.125 2024-08-09 20:45:07,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2024-08-09 20:45:09,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4350, loss[loss=0.08412, beats_loss=0.01335, ecapa_loss=0.0003374, whisper_loss=0.0674, over 18802.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.0128, ecapa_loss=0.000349, whisper_loss=0.1001, over 3824795.42 frames. ], batch size: 75, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:45:19,297 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 20:45:43,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=188680.0, ans=0.0 2024-08-09 20:45:53,460 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 20:45:58,517 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 20:45:59,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=10.0 2024-08-09 20:46:20,437 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4400, loss[loss=0.09995, beats_loss=0.01561, ecapa_loss=0.0003271, whisper_loss=0.08107, over 17702.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01279, ecapa_loss=0.0003471, whisper_loss=0.1005, over 3823698.63 frames. ], batch size: 73, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:46:23,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.890e+01 3.311e+01 3.807e+01 6.108e+01, threshold=6.622e+01, percent-clipped=0.0 2024-08-09 20:46:25,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=188980.0, ans=0.125 2024-08-09 20:46:25,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=188980.0, ans=0.125 2024-08-09 20:46:26,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=188980.0, ans=0.1 2024-08-09 20:46:53,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=189180.0, ans=0.0 2024-08-09 20:47:00,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=189180.0, ans=0.2 2024-08-09 20:47:09,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=8.0 2024-08-09 20:47:12,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-09 20:47:21,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=189380.0, ans=0.125 2024-08-09 20:47:21,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=189380.0, ans=15.0 2024-08-09 20:47:24,358 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 20:47:32,409 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 20:47:38,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4450, loss[loss=0.1268, beats_loss=0.009967, ecapa_loss=0.0003751, whisper_loss=0.1131, over 22740.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01274, ecapa_loss=0.0003468, whisper_loss=0.1011, over 3835606.87 frames. ], batch size: 91, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:47:49,222 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 20:47:56,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=189580.0, ans=0.125 2024-08-09 20:48:09,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=189580.0, ans=0.0 2024-08-09 20:48:53,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.65 vs. limit=22.5 2024-08-09 20:48:56,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=189880.0, ans=0.125 2024-08-09 20:48:57,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.62 vs. limit=15.0 2024-08-09 20:49:02,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4500, loss[loss=0.125, beats_loss=0.01224, ecapa_loss=0.0003623, whisper_loss=0.1092, over 20675.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01267, ecapa_loss=0.0003462, whisper_loss=0.1017, over 3854922.71 frames. ], batch size: 83, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:49:06,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.955e+01 3.431e+01 3.879e+01 5.998e+01, threshold=6.863e+01, percent-clipped=0.0 2024-08-09 20:49:13,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-09 20:49:40,948 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 20:49:58,860 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 20:50:10,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=190380.0, ans=0.1 2024-08-09 20:50:15,419 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-09 20:50:16,971 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 20:50:19,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=190380.0, ans=0.125 2024-08-09 20:50:24,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4550, loss[loss=0.1309, beats_loss=0.01231, ecapa_loss=0.0003539, whisper_loss=0.115, over 20182.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01273, ecapa_loss=0.0003468, whisper_loss=0.1014, over 3863647.47 frames. ], batch size: 79, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:50:47,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=190580.0, ans=0.02 2024-08-09 20:50:53,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=190580.0, ans=0.125 2024-08-09 20:50:57,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=15.0 2024-08-09 20:51:02,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=190680.0, ans=0.07 2024-08-09 20:51:07,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190680.0, ans=0.1 2024-08-09 20:51:33,360 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 20:51:33,650 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.037e-01 2024-08-09 20:51:38,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=190880.0, ans=0.0 2024-08-09 20:51:45,642 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4600, loss[loss=0.102, beats_loss=0.01127, ecapa_loss=0.0004291, whisper_loss=0.08647, over 14978.00 frames. ], tot_loss[loss=0.118, beats_loss=0.0127, ecapa_loss=0.0003486, whisper_loss=0.1019, over 3866296.08 frames. ], batch size: 62, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:51:48,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.933e+01 3.481e+01 4.250e+01 8.633e+01, threshold=6.961e+01, percent-clipped=3.0 2024-08-09 20:51:56,431 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.969e+03 2024-08-09 20:52:02,045 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 20:52:04,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=22.5 2024-08-09 20:52:20,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=191180.0, ans=0.125 2024-08-09 20:52:22,277 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 28 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-09 20:52:42,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=191280.0, ans=0.2 2024-08-09 20:52:50,374 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-09 20:52:55,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=191380.0, ans=0.035 2024-08-09 20:53:05,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4650, loss[loss=0.1076, beats_loss=0.01657, ecapa_loss=0.0003379, whisper_loss=0.08768, over 18811.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01276, ecapa_loss=0.0003473, whisper_loss=0.1019, over 3876237.22 frames. ], batch size: 76, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:53:09,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=191480.0, ans=0.0 2024-08-09 20:53:12,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2024-08-09 20:53:14,939 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 20:53:28,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=191580.0, ans=0.125 2024-08-09 20:53:40,031 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 20:54:25,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4700, loss[loss=0.1139, beats_loss=0.01225, ecapa_loss=0.0003827, whisper_loss=0.09785, over 19528.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01281, ecapa_loss=0.0003459, whisper_loss=0.1024, over 3897457.53 frames. ], batch size: 80, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:54:28,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.995e+01 3.606e+01 4.056e+01 7.854e+01, threshold=7.212e+01, percent-clipped=1.0 2024-08-09 20:54:29,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-09 20:54:53,969 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 20:55:02,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=192180.0, ans=0.95 2024-08-09 20:55:06,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=192180.0, ans=0.035 2024-08-09 20:55:10,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=192180.0, ans=0.0 2024-08-09 20:55:43,409 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 20:55:45,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4750, loss[loss=0.1153, beats_loss=0.01297, ecapa_loss=0.0003481, whisper_loss=0.09889, over 21863.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01271, ecapa_loss=0.0003489, whisper_loss=0.1039, over 3911494.92 frames. ], batch size: 88, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:55:54,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2024-08-09 20:56:19,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=192680.0, ans=0.125 2024-08-09 20:56:22,353 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 20:56:31,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-08-09 20:56:52,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=192880.0, ans=0.2 2024-08-09 20:57:04,165 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4800, loss[loss=0.1201, beats_loss=0.01211, ecapa_loss=0.0003371, whisper_loss=0.1046, over 18281.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01278, ecapa_loss=0.0003481, whisper_loss=0.1035, over 3933602.59 frames. ], batch size: 74, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:57:07,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 3.258e+01 3.599e+01 4.060e+01 6.614e+01, threshold=7.198e+01, percent-clipped=0.0 2024-08-09 20:57:12,535 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 20:57:17,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=192980.0, ans=0.2 2024-08-09 20:57:19,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=193080.0, ans=22.5 2024-08-09 20:57:22,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=193080.0, ans=0.125 2024-08-09 20:57:29,084 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-09 20:57:29,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=193080.0, ans=0.125 2024-08-09 20:57:44,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=193180.0, ans=0.0 2024-08-09 20:57:56,790 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-09 20:58:07,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=193380.0, ans=0.0 2024-08-09 20:58:17,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4850, loss[loss=0.09465, beats_loss=0.01367, ecapa_loss=0.0004523, whisper_loss=0.07645, over 20097.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01282, ecapa_loss=0.0003492, whisper_loss=0.1032, over 3917847.14 frames. ], batch size: 91, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:58:39,108 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 20:58:43,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=193580.0, ans=0.2 2024-08-09 20:58:44,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=193680.0, ans=0.125 2024-08-09 20:58:58,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=193780.0, ans=0.0 2024-08-09 20:59:01,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-08-09 20:59:03,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.79 vs. limit=22.5 2024-08-09 20:59:18,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=193880.0, ans=0.07 2024-08-09 20:59:21,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=193880.0, ans=0.2 2024-08-09 20:59:26,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=193980.0, ans=0.2 2024-08-09 20:59:27,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4900, loss[loss=0.1158, beats_loss=0.01426, ecapa_loss=0.0003321, whisper_loss=0.09821, over 19667.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01283, ecapa_loss=0.000348, whisper_loss=0.1023, over 3890948.18 frames. ], batch size: 84, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:59:29,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=193980.0, ans=10.0 2024-08-09 20:59:29,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.59 vs. limit=22.5 2024-08-09 20:59:30,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.990e+01 3.252e+01 3.746e+01 5.696e+01, threshold=6.504e+01, percent-clipped=0.0 2024-08-09 20:59:36,097 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-09 20:59:43,909 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 20:59:48,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=194080.0, ans=0.125 2024-08-09 20:59:49,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194080.0, ans=0.1 2024-08-09 20:59:56,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=194180.0, ans=0.125 2024-08-09 21:00:06,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=194180.0, ans=0.0 2024-08-09 21:00:12,837 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-09 21:00:32,175 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 21:00:35,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2024-08-09 21:00:36,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 4950, loss[loss=0.134, beats_loss=0.01313, ecapa_loss=0.0002967, whisper_loss=0.1179, over 17343.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01276, ecapa_loss=0.0003467, whisper_loss=0.1027, over 3838128.16 frames. ], batch size: 66, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 21:00:45,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=194480.0, ans=0.125 2024-08-09 21:00:49,800 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 21:01:03,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=194680.0, ans=0.125 2024-08-09 21:01:04,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=194680.0, ans=0.0 2024-08-09 21:01:04,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=194680.0, ans=0.09899494936611666 2024-08-09 21:01:25,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=194780.0, ans=0.125 2024-08-09 21:01:40,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=194880.0, ans=0.2 2024-08-09 21:01:41,683 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 21:01:43,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5000, loss[loss=0.1035, beats_loss=0.01343, ecapa_loss=0.0003598, whisper_loss=0.08652, over 20731.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01279, ecapa_loss=0.0003466, whisper_loss=0.1025, over 3852436.69 frames. ], batch size: 88, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:01:46,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.882e+01 3.259e+01 3.861e+01 5.497e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-09 21:01:48,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=194980.0, ans=0.0 2024-08-09 21:01:55,851 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-09 21:01:58,709 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 17 from LS+wenet, 29 from Vox, 47 fro AS 2024-08-09 21:01:58,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=195080.0, ans=0.125 2024-08-09 21:02:01,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195080.0, ans=0.1 2024-08-09 21:02:02,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-08-09 21:02:29,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-08-09 21:02:38,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=195380.0, ans=0.2 2024-08-09 21:02:51,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5050, loss[loss=0.1088, beats_loss=0.01563, ecapa_loss=0.0003159, whisper_loss=0.09001, over 20141.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.0129, ecapa_loss=0.0003473, whisper_loss=0.1024, over 3853432.41 frames. ], batch size: 81, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:02:57,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=15.0 2024-08-09 21:03:29,877 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:03:33,271 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-09 21:03:36,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=22.5 2024-08-09 21:03:45,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=195880.0, ans=10.0 2024-08-09 21:03:48,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=195880.0, ans=0.125 2024-08-09 21:03:51,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=195880.0, ans=0.2 2024-08-09 21:03:57,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5100, loss[loss=0.1339, beats_loss=0.01125, ecapa_loss=0.000358, whisper_loss=0.1191, over 22894.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01282, ecapa_loss=0.0003464, whisper_loss=0.102, over 3859009.60 frames. ], batch size: 89, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:03:57,516 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 21:03:59,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.875e+01 3.306e+01 3.993e+01 6.485e+01, threshold=6.613e+01, percent-clipped=0.0 2024-08-09 21:04:29,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196180.0, ans=0.1 2024-08-09 21:04:33,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=196180.0, ans=0.0 2024-08-09 21:04:37,406 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-09 21:04:40,096 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 21:04:47,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.03 vs. limit=15.0 2024-08-09 21:05:00,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=196380.0, ans=0.125 2024-08-09 21:05:05,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5150, loss[loss=0.1074, beats_loss=0.01271, ecapa_loss=0.0004273, whisper_loss=0.09037, over 21487.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01275, ecapa_loss=0.0003439, whisper_loss=0.1025, over 3883636.88 frames. ], batch size: 92, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:05:19,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=196580.0, ans=0.0 2024-08-09 21:05:24,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=196580.0, ans=0.2 2024-08-09 21:05:31,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=196680.0, ans=0.125 2024-08-09 21:05:34,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-09 21:05:40,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=196680.0, ans=0.125 2024-08-09 21:05:48,238 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 21:06:02,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=196880.0, ans=10.0 2024-08-09 21:06:10,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=196880.0, ans=0.125 2024-08-09 21:06:11,999 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 21:06:13,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5200, loss[loss=0.1089, beats_loss=0.01429, ecapa_loss=0.0003532, whisper_loss=0.09108, over 20339.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01279, ecapa_loss=0.0003408, whisper_loss=0.1017, over 3858705.83 frames. ], batch size: 86, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:06:16,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.861e+01 3.315e+01 3.921e+01 5.764e+01, threshold=6.630e+01, percent-clipped=0.0 2024-08-09 21:06:19,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=196980.0, ans=0.125 2024-08-09 21:06:20,265 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 21:06:21,764 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 21:06:32,178 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:06:42,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.91 vs. limit=15.0 2024-08-09 21:06:44,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=197180.0, ans=10.0 2024-08-09 21:06:50,642 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-09 21:06:51,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2024-08-09 21:06:58,774 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 21:07:02,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-09 21:07:06,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=197380.0, ans=0.125 2024-08-09 21:07:13,372 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 21:07:20,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-08-09 21:07:21,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5250, loss[loss=0.09699, beats_loss=0.01356, ecapa_loss=0.0003521, whisper_loss=0.0799, over 14528.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01282, ecapa_loss=0.0003404, whisper_loss=0.1013, over 3853525.61 frames. ], batch size: 59, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:07:25,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=197480.0, ans=0.125 2024-08-09 21:07:28,353 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 21:07:35,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=197580.0, ans=0.2 2024-08-09 21:07:47,593 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-09 21:07:54,447 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 21:08:00,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197680.0, ans=0.1 2024-08-09 21:08:28,075 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 37 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 21:08:30,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5300, loss[loss=0.1354, beats_loss=0.01225, ecapa_loss=0.0003062, whisper_loss=0.1201, over 23915.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.0128, ecapa_loss=0.0003404, whisper_loss=0.1017, over 3859734.30 frames. ], batch size: 91, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:08:33,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.918e+01 3.459e+01 4.148e+01 6.900e+01, threshold=6.919e+01, percent-clipped=2.0 2024-08-09 21:08:48,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=198080.0, ans=0.2 2024-08-09 21:09:09,539 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 21:09:09,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=198180.0, ans=0.09899494936611666 2024-08-09 21:09:11,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=198280.0, ans=0.125 2024-08-09 21:09:11,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=198280.0, ans=0.125 2024-08-09 21:09:18,022 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 21:09:21,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=12.0 2024-08-09 21:09:33,507 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 21:09:33,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=198380.0, ans=0.0 2024-08-09 21:09:39,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198480.0, ans=0.1 2024-08-09 21:09:40,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5350, loss[loss=0.09008, beats_loss=0.01542, ecapa_loss=0.0003161, whisper_loss=0.0715, over 15736.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01271, ecapa_loss=0.0003384, whisper_loss=0.1023, over 3866930.29 frames. ], batch size: 65, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:09:40,500 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 21:09:43,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=198480.0, ans=0.125 2024-08-09 21:10:08,933 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 21:10:18,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=198680.0, ans=0.5 2024-08-09 21:10:26,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-08-09 21:10:26,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-09 21:10:46,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=198880.0, ans=0.125 2024-08-09 21:10:52,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5400, loss[loss=0.1119, beats_loss=0.01315, ecapa_loss=0.0003705, whisper_loss=0.09509, over 19400.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01269, ecapa_loss=0.0003385, whisper_loss=0.1023, over 3862696.05 frames. ], batch size: 79, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:10:55,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.905e+01 3.438e+01 3.898e+01 7.093e+01, threshold=6.876e+01, percent-clipped=1.0 2024-08-09 21:10:57,128 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 21:10:59,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198980.0, ans=0.1 2024-08-09 21:11:03,919 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 21:11:10,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=199080.0, ans=0.0 2024-08-09 21:11:28,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=199180.0, ans=0.0 2024-08-09 21:11:40,227 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-09 21:11:40,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2024-08-09 21:12:06,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5450, loss[loss=0.1332, beats_loss=0.009308, ecapa_loss=0.0003796, whisper_loss=0.1201, over 17812.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01262, ecapa_loss=0.0003415, whisper_loss=0.1027, over 3883823.68 frames. ], batch size: 69, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:12:19,685 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 21:12:21,100 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 21:12:39,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=199680.0, ans=0.2 2024-08-09 21:12:49,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2024-08-09 21:13:08,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=199880.0, ans=0.05 2024-08-09 21:13:18,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5500, loss[loss=0.1205, beats_loss=0.01411, ecapa_loss=0.0003171, whisper_loss=0.1032, over 20127.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01266, ecapa_loss=0.0003404, whisper_loss=0.1026, over 3876135.64 frames. ], batch size: 79, lr: 2.61e-02, grad_scale: 16384.0 2024-08-09 21:13:22,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=199980.0, ans=0.125 2024-08-09 21:13:23,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 3.012e+01 3.355e+01 3.811e+01 5.286e+01, threshold=6.711e+01, percent-clipped=0.0 2024-08-09 21:13:52,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200180.0, ans=0.1 2024-08-09 21:14:12,910 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 21:14:19,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=200380.0, ans=0.125 2024-08-09 21:14:23,826 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 19 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-09 21:14:28,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=200380.0, ans=0.2 2024-08-09 21:14:32,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=200480.0, ans=0.2 2024-08-09 21:14:33,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5550, loss[loss=0.1192, beats_loss=0.0147, ecapa_loss=0.0003923, whisper_loss=0.1006, over 16439.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01279, ecapa_loss=0.0003409, whisper_loss=0.1022, over 3888322.69 frames. ], batch size: 67, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:14:47,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-08-09 21:15:05,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=200680.0, ans=0.125 2024-08-09 21:15:09,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=200680.0, ans=0.0 2024-08-09 21:15:28,720 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-09 21:15:34,647 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 21:15:38,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=200880.0, ans=0.0 2024-08-09 21:15:43,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200880.0, ans=0.0 2024-08-09 21:15:43,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=200880.0, ans=0.125 2024-08-09 21:15:46,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5600, loss[loss=0.09436, beats_loss=0.01637, ecapa_loss=0.0003077, whisper_loss=0.07492, over 16308.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01272, ecapa_loss=0.000339, whisper_loss=0.1027, over 3899929.20 frames. ], batch size: 66, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:15:47,423 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.626e+00 2024-08-09 21:15:49,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 3.019e+01 3.603e+01 4.139e+01 2.249e+02, threshold=7.206e+01, percent-clipped=7.0 2024-08-09 21:15:56,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=200980.0, ans=0.125 2024-08-09 21:16:00,989 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:16:10,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=201080.0, ans=0.0 2024-08-09 21:16:11,962 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 21:16:14,624 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 21:16:19,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=201180.0, ans=0.0 2024-08-09 21:16:22,706 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 21:16:30,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=201280.0, ans=0.2 2024-08-09 21:16:56,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5650, loss[loss=0.1305, beats_loss=0.01384, ecapa_loss=0.000267, whisper_loss=0.1139, over 23474.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01281, ecapa_loss=0.0003366, whisper_loss=0.1016, over 3927592.19 frames. ], batch size: 92, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:16:57,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=201480.0, ans=22.5 2024-08-09 21:17:12,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.53 vs. limit=10.0 2024-08-09 21:17:27,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=201680.0, ans=0.0 2024-08-09 21:17:27,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=201680.0, ans=0.2 2024-08-09 21:17:30,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=201680.0, ans=0.2 2024-08-09 21:17:32,536 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 21:17:40,394 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 21:17:47,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=201780.0, ans=0.125 2024-08-09 21:17:52,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=201880.0, ans=0.1 2024-08-09 21:17:53,409 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-09 21:17:54,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=201880.0, ans=0.125 2024-08-09 21:17:56,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=201880.0, ans=0.0 2024-08-09 21:18:03,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5700, loss[loss=0.1236, beats_loss=0.01205, ecapa_loss=0.0003817, whisper_loss=0.1078, over 19821.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01278, ecapa_loss=0.0003372, whisper_loss=0.1018, over 3950929.91 frames. ], batch size: 81, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:18:06,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 3.095e+01 3.448e+01 4.225e+01 7.062e+01, threshold=6.897e+01, percent-clipped=0.0 2024-08-09 21:18:18,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.25 vs. limit=22.5 2024-08-09 21:18:22,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=202080.0, ans=0.125 2024-08-09 21:18:29,225 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 21:18:35,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=202180.0, ans=0.125 2024-08-09 21:18:37,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=202180.0, ans=0.2 2024-08-09 21:18:48,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=202280.0, ans=0.125 2024-08-09 21:18:53,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=202280.0, ans=0.0 2024-08-09 21:18:56,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.55 vs. limit=22.5 2024-08-09 21:19:09,323 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 21:19:10,516 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5750, loss[loss=0.09723, beats_loss=0.01381, ecapa_loss=0.0002635, whisper_loss=0.08079, over 15473.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01273, ecapa_loss=0.0003375, whisper_loss=0.1022, over 3948257.62 frames. ], batch size: 58, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:19:19,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=202480.0, ans=0.125 2024-08-09 21:19:22,837 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-09 21:19:30,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=202580.0, ans=0.0 2024-08-09 21:19:30,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2024-08-09 21:19:55,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-08-09 21:19:57,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=202780.0, ans=0.125 2024-08-09 21:20:00,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=202780.0, ans=0.125 2024-08-09 21:20:17,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5800, loss[loss=0.1101, beats_loss=0.01314, ecapa_loss=0.0003953, whisper_loss=0.093, over 20698.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01278, ecapa_loss=0.0003386, whisper_loss=0.1018, over 3929512.34 frames. ], batch size: 91, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:20:20,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.100e+01 3.407e+01 4.370e+01 6.410e+01, threshold=6.814e+01, percent-clipped=0.0 2024-08-09 21:20:52,645 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-09 21:20:52,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=203180.0, ans=0.0 2024-08-09 21:21:14,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=203380.0, ans=0.125 2024-08-09 21:21:24,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5850, loss[loss=0.1429, beats_loss=0.01228, ecapa_loss=0.0003906, whisper_loss=0.1267, over 19924.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.0128, ecapa_loss=0.0003392, whisper_loss=0.1022, over 3925753.20 frames. ], batch size: 81, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:21:25,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=203480.0, ans=0.05 2024-08-09 21:21:25,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.14 vs. limit=22.5 2024-08-09 21:21:47,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=203580.0, ans=0.0 2024-08-09 21:21:57,745 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 9 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 21:22:07,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=203780.0, ans=0.125 2024-08-09 21:22:19,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=203880.0, ans=0.125 2024-08-09 21:22:20,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2024-08-09 21:22:31,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5900, loss[loss=0.125, beats_loss=0.00996, ecapa_loss=0.000326, whisper_loss=0.1118, over 20884.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01277, ecapa_loss=0.0003381, whisper_loss=0.1026, over 3923587.66 frames. ], batch size: 83, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:22:34,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.068e+01 3.370e+01 4.019e+01 7.434e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-09 21:22:36,848 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 21:22:51,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-09 21:22:56,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204080.0, ans=0.1 2024-08-09 21:22:58,373 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 21:22:59,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2024-08-09 21:23:09,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=204180.0, ans=0.125 2024-08-09 21:23:20,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.63 vs. limit=10.0 2024-08-09 21:23:25,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=204380.0, ans=0.125 2024-08-09 21:23:29,860 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 21:23:34,219 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 21:23:34,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=204380.0, ans=0.125 2024-08-09 21:23:39,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 5950, loss[loss=0.08562, beats_loss=0.01489, ecapa_loss=0.0003316, whisper_loss=0.06741, over 18049.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01273, ecapa_loss=0.0003394, whisper_loss=0.1019, over 3904530.77 frames. ], batch size: 73, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:23:55,102 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 21:24:44,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6000, loss[loss=0.1156, beats_loss=0.01404, ecapa_loss=0.0002556, whisper_loss=0.099, over 22876.00 frames. ], tot_loss[loss=0.118, beats_loss=0.0128, ecapa_loss=0.0003368, whisper_loss=0.1018, over 3909927.97 frames. ], batch size: 91, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:24:44,467 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 21:25:23,366 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.0956, 3.2325, 2.5613, 1.5780], device='cuda:3') 2024-08-09 21:25:26,047 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on ASR_libri: loss=0.2831, beats_loss=0, ecapa_loss=0.0009654, whisper_loss=0.2734, over 922467.00 frames. 2024-08-09 21:25:44,599 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on SV_voxceleb1: loss=0.008561, beats_loss=0, ecapa_loss=0.0008561, whisper_loss=0, over 939242.00 frames. 2024-08-09 21:27:41,203 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on AT_audioset: loss=0.03036, beats_loss=0.03036, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 21:27:41,207 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 21:27:43,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.831e+01 3.333e+01 3.565e+01 5.881e+01, threshold=6.666e+01, percent-clipped=0.0 2024-08-09 21:28:16,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=205180.0, ans=0.125 2024-08-09 21:28:28,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=205280.0, ans=0.0 2024-08-09 21:28:35,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205380.0, ans=0.1 2024-08-09 21:28:45,944 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 21:28:48,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6050, loss[loss=0.1147, beats_loss=0.01495, ecapa_loss=0.0003392, whisper_loss=0.09634, over 22826.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01275, ecapa_loss=0.0003344, whisper_loss=0.1023, over 3892169.02 frames. ], batch size: 93, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:28:48,791 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 21:28:50,128 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 21:28:51,349 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-09 21:28:56,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=205480.0, ans=0.0 2024-08-09 21:29:13,347 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 21:29:26,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=205780.0, ans=0.125 2024-08-09 21:29:38,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=205780.0, ans=0.0 2024-08-09 21:29:47,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=205880.0, ans=0.0 2024-08-09 21:29:54,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6100, loss[loss=0.09633, beats_loss=0.01557, ecapa_loss=0.0003145, whisper_loss=0.07761, over 22744.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01271, ecapa_loss=0.0003375, whisper_loss=0.1021, over 3903802.89 frames. ], batch size: 94, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:29:55,084 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:29:57,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.058e+01 3.470e+01 4.090e+01 8.250e+01, threshold=6.939e+01, percent-clipped=1.0 2024-08-09 21:29:58,033 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 21:30:06,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=205980.0, ans=0.035 2024-08-09 21:30:07,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=206080.0, ans=0.125 2024-08-09 21:30:22,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.99 vs. limit=15.0 2024-08-09 21:30:30,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=206180.0, ans=0.125 2024-08-09 21:30:42,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=206280.0, ans=0.125 2024-08-09 21:30:54,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=206380.0, ans=0.0 2024-08-09 21:31:03,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6150, loss[loss=0.1098, beats_loss=0.0135, ecapa_loss=0.0004138, whisper_loss=0.09219, over 22337.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01272, ecapa_loss=0.0003366, whisper_loss=0.1022, over 3911905.21 frames. ], batch size: 94, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:31:16,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=206580.0, ans=0.0 2024-08-09 21:31:22,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-08-09 21:31:31,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=206680.0, ans=0.0 2024-08-09 21:31:48,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=206780.0, ans=0.2 2024-08-09 21:31:49,048 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 21:31:50,429 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 21:31:51,569 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 21:32:04,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.58 vs. limit=22.5 2024-08-09 21:32:10,582 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6200, loss[loss=0.1235, beats_loss=0.01153, ecapa_loss=0.0004022, whisper_loss=0.1079, over 20786.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01278, ecapa_loss=0.0003373, whisper_loss=0.1019, over 3917655.22 frames. ], batch size: 85, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:32:10,736 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 21:32:13,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 3.042e+01 3.611e+01 4.258e+01 6.640e+01, threshold=7.222e+01, percent-clipped=0.0 2024-08-09 21:32:16,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=206980.0, ans=0.02 2024-08-09 21:32:19,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=206980.0, ans=0.035 2024-08-09 21:32:27,022 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 21:32:28,264 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 21:32:35,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-09 21:32:42,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=207180.0, ans=0.125 2024-08-09 21:32:43,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=207180.0, ans=0.0 2024-08-09 21:32:45,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=207180.0, ans=0.125 2024-08-09 21:32:52,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2024-08-09 21:32:53,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2024-08-09 21:33:02,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=207280.0, ans=0.0 2024-08-09 21:33:06,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=207380.0, ans=0.0 2024-08-09 21:33:08,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2024-08-09 21:33:18,284 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6250, loss[loss=0.112, beats_loss=0.01409, ecapa_loss=0.0003167, whisper_loss=0.09476, over 21649.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01278, ecapa_loss=0.0003391, whisper_loss=0.1017, over 3909965.39 frames. ], batch size: 88, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:33:31,317 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 21:33:45,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=207680.0, ans=0.0 2024-08-09 21:33:46,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=207680.0, ans=0.0 2024-08-09 21:33:48,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=207680.0, ans=10.0 2024-08-09 21:34:13,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-09 21:34:16,834 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 21:34:20,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=207880.0, ans=0.125 2024-08-09 21:34:27,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6300, loss[loss=0.1251, beats_loss=0.01283, ecapa_loss=0.0003619, whisper_loss=0.1087, over 16133.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.0127, ecapa_loss=0.0003396, whisper_loss=0.102, over 3901805.94 frames. ], batch size: 66, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:34:30,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.893e+01 3.305e+01 3.810e+01 5.470e+01, threshold=6.610e+01, percent-clipped=0.0 2024-08-09 21:34:37,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=207980.0, ans=0.125 2024-08-09 21:34:47,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=208080.0, ans=0.0 2024-08-09 21:35:00,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-09 21:35:19,802 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.833e+00 2024-08-09 21:35:32,165 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 21:35:35,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6350, loss[loss=0.09047, beats_loss=0.01641, ecapa_loss=0.0002622, whisper_loss=0.07144, over 22112.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01272, ecapa_loss=0.0003371, whisper_loss=0.1017, over 3886864.16 frames. ], batch size: 90, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:35:44,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=208480.0, ans=0.0 2024-08-09 21:35:46,490 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-09 21:36:04,757 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 21:36:08,333 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.518e-03 2024-08-09 21:36:08,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=208680.0, ans=0.125 2024-08-09 21:36:14,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=208680.0, ans=0.125 2024-08-09 21:36:27,536 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 21:36:32,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=208880.0, ans=0.2 2024-08-09 21:36:36,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2024-08-09 21:36:44,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-08-09 21:36:44,966 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6400, loss[loss=0.1083, beats_loss=0.01407, ecapa_loss=0.0003615, whisper_loss=0.09065, over 21423.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.0127, ecapa_loss=0.0003364, whisper_loss=0.1014, over 3897336.34 frames. ], batch size: 92, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:36:48,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 3.030e+01 3.423e+01 4.041e+01 6.749e+01, threshold=6.846e+01, percent-clipped=1.0 2024-08-09 21:37:03,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.91 vs. limit=15.0 2024-08-09 21:37:13,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=209180.0, ans=0.0 2024-08-09 21:37:54,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6450, loss[loss=0.1084, beats_loss=0.01434, ecapa_loss=0.0003283, whisper_loss=0.09077, over 21862.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01277, ecapa_loss=0.0003349, whisper_loss=0.1011, over 3901227.39 frames. ], batch size: 89, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:37:55,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-09 21:38:07,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2024-08-09 21:38:08,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=209580.0, ans=0.5 2024-08-09 21:38:16,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=209580.0, ans=0.125 2024-08-09 21:38:20,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=12.0 2024-08-09 21:38:21,478 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-09 21:38:27,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=209680.0, ans=0.0 2024-08-09 21:38:29,459 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 21:38:36,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2024-08-09 21:38:46,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=12.0 2024-08-09 21:38:46,703 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-09 21:38:52,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=209880.0, ans=0.04949747468305833 2024-08-09 21:39:04,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6500, loss[loss=0.07217, beats_loss=0.01808, ecapa_loss=0.0002611, whisper_loss=0.05148, over 16785.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01281, ecapa_loss=0.0003338, whisper_loss=0.1007, over 3898581.58 frames. ], batch size: 70, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:39:06,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2024-08-09 21:39:07,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.878e+01 3.238e+01 3.656e+01 8.439e+01, threshold=6.476e+01, percent-clipped=1.0 2024-08-09 21:39:39,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=210180.0, ans=0.125 2024-08-09 21:39:43,225 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 21:40:06,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2024-08-09 21:40:14,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6550, loss[loss=0.1322, beats_loss=0.009882, ecapa_loss=0.0003207, whisper_loss=0.1191, over 14432.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01274, ecapa_loss=0.0003334, whisper_loss=0.101, over 3913072.45 frames. ], batch size: 54, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:40:16,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=210480.0, ans=0.0 2024-08-09 21:40:21,163 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-09 21:40:24,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210480.0, ans=0.1 2024-08-09 21:40:25,288 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 21:40:28,207 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 21:40:57,167 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 21:40:59,915 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-09 21:41:15,721 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:41:18,240 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 21:41:22,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6600, loss[loss=0.113, beats_loss=0.01205, ecapa_loss=0.0003861, whisper_loss=0.09705, over 22150.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.0127, ecapa_loss=0.000335, whisper_loss=0.1014, over 3944261.53 frames. ], batch size: 94, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:41:22,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=210980.0, ans=0.0 2024-08-09 21:41:24,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.037e+01 3.483e+01 4.077e+01 6.253e+01, threshold=6.966e+01, percent-clipped=0.0 2024-08-09 21:41:25,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=210980.0, ans=0.0 2024-08-09 21:41:27,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=210980.0, ans=0.125 2024-08-09 21:41:28,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-09 21:41:41,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=211080.0, ans=0.04949747468305833 2024-08-09 21:41:50,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=211180.0, ans=0.0 2024-08-09 21:41:51,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2024-08-09 21:41:58,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=211180.0, ans=0.0 2024-08-09 21:42:06,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211280.0, ans=0.1 2024-08-09 21:42:13,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2024-08-09 21:42:17,513 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.548e-03 2024-08-09 21:42:31,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6650, loss[loss=0.1181, beats_loss=0.01093, ecapa_loss=0.0004016, whisper_loss=0.1031, over 19920.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01261, ecapa_loss=0.0003382, whisper_loss=0.1019, over 3930808.22 frames. ], batch size: 83, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:42:34,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-09 21:42:34,846 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 21:42:36,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=211480.0, ans=0.125 2024-08-09 21:42:40,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=211480.0, ans=0.125 2024-08-09 21:42:47,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=12.0 2024-08-09 21:42:51,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=211580.0, ans=0.125 2024-08-09 21:42:55,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=211580.0, ans=0.125 2024-08-09 21:43:08,621 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-09 21:43:10,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=211780.0, ans=0.07 2024-08-09 21:43:12,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-09 21:43:25,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211880.0, ans=0.1 2024-08-09 21:43:32,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=211880.0, ans=0.0 2024-08-09 21:43:38,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6700, loss[loss=0.128, beats_loss=0.01416, ecapa_loss=0.0003284, whisper_loss=0.1105, over 20878.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01267, ecapa_loss=0.000335, whisper_loss=0.1024, over 3927421.55 frames. ], batch size: 85, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:43:38,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=211980.0, ans=0.125 2024-08-09 21:43:41,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.049e+01 3.429e+01 4.303e+01 7.619e+01, threshold=6.858e+01, percent-clipped=1.0 2024-08-09 21:43:41,164 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 21:43:59,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=212080.0, ans=0.0 2024-08-09 21:44:07,401 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 21:44:08,719 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-09 21:44:09,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=212180.0, ans=0.2 2024-08-09 21:44:14,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=212180.0, ans=0.125 2024-08-09 21:44:22,646 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:44:47,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6750, loss[loss=0.119, beats_loss=0.01208, ecapa_loss=0.0003408, whisper_loss=0.1035, over 15356.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01265, ecapa_loss=0.0003365, whisper_loss=0.1025, over 3882564.31 frames. ], batch size: 60, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:44:49,479 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 21:45:37,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=212780.0, ans=0.2 2024-08-09 21:45:48,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=212880.0, ans=0.125 2024-08-09 21:45:53,822 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 21:45:55,110 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 21:45:56,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6800, loss[loss=0.09789, beats_loss=0.0143, ecapa_loss=0.0003184, whisper_loss=0.0804, over 16465.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01262, ecapa_loss=0.0003356, whisper_loss=0.1025, over 3847950.49 frames. ], batch size: 67, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:45:58,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.928e+01 3.409e+01 4.100e+01 8.566e+01, threshold=6.819e+01, percent-clipped=2.0 2024-08-09 21:46:07,286 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 21:46:10,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2024-08-09 21:46:11,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=213080.0, ans=0.0 2024-08-09 21:46:21,694 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 21:46:22,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213180.0, ans=0.1 2024-08-09 21:46:31,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=213180.0, ans=0.125 2024-08-09 21:46:35,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=213280.0, ans=0.0 2024-08-09 21:46:47,352 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 21:46:57,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=213380.0, ans=0.125 2024-08-09 21:46:59,833 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 35 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 21:47:00,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213380.0, ans=0.1 2024-08-09 21:47:03,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6850, loss[loss=0.1142, beats_loss=0.0143, ecapa_loss=0.0003068, whisper_loss=0.09681, over 19508.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01266, ecapa_loss=0.0003359, whisper_loss=0.1021, over 3856330.83 frames. ], batch size: 74, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:47:15,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=213480.0, ans=0.125 2024-08-09 21:47:19,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=213580.0, ans=0.125 2024-08-09 21:47:24,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=213580.0, ans=0.0 2024-08-09 21:47:41,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=213680.0, ans=0.0 2024-08-09 21:48:03,309 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 21:48:07,277 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 17 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 21:48:10,960 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6900, loss[loss=0.137, beats_loss=0.008911, ecapa_loss=0.0002754, whisper_loss=0.1253, over 16590.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01271, ecapa_loss=0.0003378, whisper_loss=0.1008, over 3828254.65 frames. ], batch size: 59, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:48:13,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.002e+01 3.455e+01 4.166e+01 7.035e+01, threshold=6.909e+01, percent-clipped=1.0 2024-08-09 21:48:45,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=214180.0, ans=0.125 2024-08-09 21:48:50,653 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 21:49:17,841 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 6950, loss[loss=0.1304, beats_loss=0.01411, ecapa_loss=0.0003221, whisper_loss=0.1131, over 22256.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01281, ecapa_loss=0.0003349, whisper_loss=0.1011, over 3849278.42 frames. ], batch size: 88, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:49:22,066 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 21:49:27,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=214480.0, ans=0.0 2024-08-09 21:49:35,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=214580.0, ans=0.05 2024-08-09 21:49:36,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2024-08-09 21:49:39,460 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 21:49:44,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=214680.0, ans=0.2 2024-08-09 21:50:11,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-09 21:50:14,306 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 21:50:16,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-09 21:50:17,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214880.0, ans=0.125 2024-08-09 21:50:23,270 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 21:50:24,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7000, loss[loss=0.09807, beats_loss=0.01148, ecapa_loss=0.0002653, whisper_loss=0.08394, over 14598.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01279, ecapa_loss=0.0003342, whisper_loss=0.1009, over 3862918.39 frames. ], batch size: 53, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:50:24,545 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 21:50:27,172 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.842e+01 3.336e+01 4.058e+01 9.243e+01, threshold=6.672e+01, percent-clipped=2.0 2024-08-09 21:50:30,136 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-09 21:50:43,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=215080.0, ans=0.125 2024-08-09 21:50:44,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-09 21:50:45,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=215080.0, ans=0.2 2024-08-09 21:50:53,605 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-09 21:51:11,262 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-09 21:51:27,549 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 21:51:29,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=215380.0, ans=0.0 2024-08-09 21:51:33,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7050, loss[loss=0.09714, beats_loss=0.01312, ecapa_loss=0.0002877, whisper_loss=0.08114, over 19938.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01281, ecapa_loss=0.0003344, whisper_loss=0.101, over 3879174.85 frames. ], batch size: 79, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:51:39,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=215480.0, ans=0.125 2024-08-09 21:51:42,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2024-08-09 21:51:47,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=215580.0, ans=0.125 2024-08-09 21:51:54,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=215580.0, ans=0.125 2024-08-09 21:51:54,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=215580.0, ans=0.125 2024-08-09 21:51:59,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=215680.0, ans=0.0 2024-08-09 21:51:59,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=215680.0, ans=0.2 2024-08-09 21:52:04,458 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 21:52:16,042 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-09 21:52:22,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=215780.0, ans=0.0 2024-08-09 21:52:39,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-09 21:52:41,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7100, loss[loss=0.1258, beats_loss=0.01414, ecapa_loss=0.000245, whisper_loss=0.1092, over 20696.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01281, ecapa_loss=0.0003318, whisper_loss=0.1008, over 3859434.61 frames. ], batch size: 78, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:52:41,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2024-08-09 21:52:43,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.849e+01 3.267e+01 3.796e+01 6.737e+01, threshold=6.534e+01, percent-clipped=1.0 2024-08-09 21:52:49,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2024-08-09 21:52:53,953 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 21:53:00,782 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 21:53:02,240 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 21:53:15,362 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 21:53:23,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=216280.0, ans=0.2 2024-08-09 21:53:24,769 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 21:53:43,234 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 21:53:44,574 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 21:53:48,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7150, loss[loss=0.1015, beats_loss=0.01513, ecapa_loss=0.0003041, whisper_loss=0.08333, over 18612.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01278, ecapa_loss=0.0003306, whisper_loss=0.1011, over 3832317.69 frames. ], batch size: 77, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:53:49,924 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 21:53:52,730 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 21:53:54,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=216480.0, ans=0.125 2024-08-09 21:53:57,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=216480.0, ans=0.125 2024-08-09 21:54:17,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=216680.0, ans=0.09899494936611666 2024-08-09 21:54:21,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=216680.0, ans=0.125 2024-08-09 21:54:23,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=216680.0, ans=0.0 2024-08-09 21:54:23,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=216680.0, ans=0.125 2024-08-09 21:54:26,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=216680.0, ans=0.2 2024-08-09 21:54:26,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216680.0, ans=0.1 2024-08-09 21:54:33,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=12.0 2024-08-09 21:54:35,410 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 21:54:48,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216880.0, ans=0.1 2024-08-09 21:54:54,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7200, loss[loss=0.1212, beats_loss=0.01191, ecapa_loss=0.0003995, whisper_loss=0.1053, over 22196.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01282, ecapa_loss=0.0003311, whisper_loss=0.1008, over 3868619.63 frames. ], batch size: 93, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:54:57,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 3.192e+01 3.694e+01 4.293e+01 6.634e+01, threshold=7.388e+01, percent-clipped=1.0 2024-08-09 21:55:01,397 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-09 21:55:01,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=216980.0, ans=0.125 2024-08-09 21:55:04,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=216980.0, ans=0.0 2024-08-09 21:55:05,131 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 21:55:17,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=217080.0, ans=0.1 2024-08-09 21:55:21,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=217180.0, ans=0.125 2024-08-09 21:55:23,815 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 21:55:25,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217180.0, ans=0.1 2024-08-09 21:55:28,269 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 21:55:39,061 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 21:55:51,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217380.0, ans=0.1 2024-08-09 21:55:55,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=217380.0, ans=0.125 2024-08-09 21:56:00,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7250, loss[loss=0.1337, beats_loss=0.01229, ecapa_loss=0.0002963, whisper_loss=0.1184, over 23585.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01271, ecapa_loss=0.0003338, whisper_loss=0.1013, over 3881076.68 frames. ], batch size: 90, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:56:09,321 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=1.558e-02 2024-08-09 21:56:13,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=217580.0, ans=0.125 2024-08-09 21:56:22,169 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 21:56:35,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=217680.0, ans=0.125 2024-08-09 21:56:37,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217680.0, ans=0.1 2024-08-09 21:56:38,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=217680.0, ans=0.125 2024-08-09 21:56:53,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=217880.0, ans=0.035 2024-08-09 21:56:54,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-08-09 21:57:07,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7300, loss[loss=0.1022, beats_loss=0.01554, ecapa_loss=0.0002712, whisper_loss=0.08392, over 21145.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01273, ecapa_loss=0.0003329, whisper_loss=0.1017, over 3880677.72 frames. ], batch size: 85, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:57:10,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 3.021e+01 3.524e+01 4.153e+01 7.749e+01, threshold=7.049e+01, percent-clipped=1.0 2024-08-09 21:57:31,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2024-08-09 21:57:38,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=12.0 2024-08-09 21:57:49,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=218280.0, ans=0.09899494936611666 2024-08-09 21:57:56,611 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-09 21:58:00,746 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 21:58:04,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=218380.0, ans=0.125 2024-08-09 21:58:09,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=218380.0, ans=15.0 2024-08-09 21:58:15,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7350, loss[loss=0.081, beats_loss=0.01485, ecapa_loss=0.0003019, whisper_loss=0.06313, over 13451.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01276, ecapa_loss=0.0003328, whisper_loss=0.1014, over 3895489.93 frames. ], batch size: 55, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:58:26,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=218480.0, ans=0.125 2024-08-09 21:58:39,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=218580.0, ans=0.035 2024-08-09 21:58:44,086 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 21:58:47,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=218680.0, ans=0.0 2024-08-09 21:58:58,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-09 21:59:13,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=22.5 2024-08-09 21:59:21,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=218980.0, ans=0.125 2024-08-09 21:59:22,031 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7400, loss[loss=0.1123, beats_loss=0.01329, ecapa_loss=0.0003166, whisper_loss=0.09581, over 20979.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01272, ecapa_loss=0.0003332, whisper_loss=0.1013, over 3914627.33 frames. ], batch size: 82, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:59:24,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.912e+01 3.245e+01 3.982e+01 7.444e+01, threshold=6.489e+01, percent-clipped=1.0 2024-08-09 21:59:28,752 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 21:59:32,638 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 31 from Vox, 23 fro AS 2024-08-09 21:59:47,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=219180.0, ans=0.0 2024-08-09 21:59:48,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=219180.0, ans=0.125 2024-08-09 21:59:55,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-09 22:00:01,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=219280.0, ans=0.0 2024-08-09 22:00:24,706 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 34 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 22:00:27,306 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7450, loss[loss=0.1122, beats_loss=0.01334, ecapa_loss=0.0002207, whisper_loss=0.0967, over 16747.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01273, ecapa_loss=0.0003312, whisper_loss=0.1012, over 3922871.36 frames. ], batch size: 58, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:00:53,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.28 vs. limit=22.5 2024-08-09 22:00:55,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=15.0 2024-08-09 22:01:01,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219680.0, ans=0.1 2024-08-09 22:01:09,155 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 22:01:24,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=219880.0, ans=0.07 2024-08-09 22:01:29,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=219880.0, ans=0.125 2024-08-09 22:01:32,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7500, loss[loss=0.1229, beats_loss=0.01132, ecapa_loss=0.0003327, whisper_loss=0.1082, over 19484.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01269, ecapa_loss=0.0003317, whisper_loss=0.1014, over 3920060.88 frames. ], batch size: 77, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:01:32,282 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 22:01:34,777 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.195e+01 3.556e+01 4.126e+01 6.406e+01, threshold=7.112e+01, percent-clipped=0.0 2024-08-09 22:01:45,421 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 22:01:53,446 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 22:02:11,816 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-09 22:02:22,685 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 22:02:27,739 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 22:02:38,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7550, loss[loss=0.1025, beats_loss=0.01357, ecapa_loss=0.0003425, whisper_loss=0.08547, over 17532.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01269, ecapa_loss=0.0003307, whisper_loss=0.1014, over 3912436.45 frames. ], batch size: 71, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:02:40,071 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 22:02:44,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220480.0, ans=0.1 2024-08-09 22:02:46,348 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 22:03:13,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-08-09 22:03:15,844 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-09 22:03:17,060 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 22:03:25,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-09 22:03:28,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2024-08-09 22:03:43,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7600, loss[loss=0.1002, beats_loss=0.0147, ecapa_loss=0.0003032, whisper_loss=0.08247, over 19618.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01276, ecapa_loss=0.0003293, whisper_loss=0.1011, over 3934039.87 frames. ], batch size: 80, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:03:46,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.898e+01 3.243e+01 3.786e+01 9.374e+01, threshold=6.487e+01, percent-clipped=2.0 2024-08-09 22:04:00,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-09 22:04:08,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=221080.0, ans=0.125 2024-08-09 22:04:24,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-08-09 22:04:26,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=221280.0, ans=15.0 2024-08-09 22:04:51,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7650, loss[loss=0.1124, beats_loss=0.01228, ecapa_loss=0.0002864, whisper_loss=0.09723, over 14902.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01274, ecapa_loss=0.0003282, whisper_loss=0.1014, over 3922476.75 frames. ], batch size: 58, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:05:03,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=221480.0, ans=0.0 2024-08-09 22:05:10,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=221580.0, ans=0.125 2024-08-09 22:05:10,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=221580.0, ans=0.125 2024-08-09 22:05:28,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2024-08-09 22:05:40,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2024-08-09 22:06:05,845 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 22:06:13,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7700, loss[loss=0.09816, beats_loss=0.01376, ecapa_loss=0.0003829, whisper_loss=0.08058, over 21476.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01273, ecapa_loss=0.0003284, whisper_loss=0.1013, over 3949561.81 frames. ], batch size: 92, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:06:15,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.870e+01 3.289e+01 3.671e+01 6.131e+01, threshold=6.578e+01, percent-clipped=0.0 2024-08-09 22:06:28,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=221980.0, ans=0.2 2024-08-09 22:06:39,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=222080.0, ans=0.0 2024-08-09 22:06:42,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=222080.0, ans=0.125 2024-08-09 22:07:26,205 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-09 22:07:59,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7750, loss[loss=0.1269, beats_loss=0.01248, ecapa_loss=0.0004202, whisper_loss=0.1102, over 18012.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01278, ecapa_loss=0.0003308, whisper_loss=0.1009, over 3952425.31 frames. ], batch size: 76, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:08:05,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=222480.0, ans=0.125 2024-08-09 22:08:18,978 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 22:08:22,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=222580.0, ans=0.07 2024-08-09 22:08:32,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-09 22:08:33,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=222680.0, ans=0.0 2024-08-09 22:08:40,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-09 22:08:51,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=222780.0, ans=0.0 2024-08-09 22:09:07,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222880.0, ans=0.1 2024-08-09 22:09:08,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=222880.0, ans=0.2 2024-08-09 22:09:16,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7800, loss[loss=0.1207, beats_loss=0.01422, ecapa_loss=0.0002826, whisper_loss=0.1036, over 23396.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01277, ecapa_loss=0.0003293, whisper_loss=0.1016, over 3968811.65 frames. ], batch size: 94, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:09:19,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.196e+01 3.636e+01 4.618e+01 8.254e+01, threshold=7.273e+01, percent-clipped=2.0 2024-08-09 22:09:25,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=222980.0, ans=0.125 2024-08-09 22:09:30,128 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 22:09:53,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=223180.0, ans=0.09899494936611666 2024-08-09 22:10:04,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=223280.0, ans=0.125 2024-08-09 22:10:05,338 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 22:10:27,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-09 22:10:32,553 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7850, loss[loss=0.1067, beats_loss=0.01464, ecapa_loss=0.0003545, whisper_loss=0.08847, over 17500.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01276, ecapa_loss=0.0003284, whisper_loss=0.1015, over 3933731.13 frames. ], batch size: 74, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:11:01,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=223680.0, ans=0.09899494936611666 2024-08-09 22:11:11,987 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2024-08-09 22:11:18,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=223780.0, ans=15.0 2024-08-09 22:11:35,810 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 22:11:39,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=223880.0, ans=0.5 2024-08-09 22:11:47,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7900, loss[loss=0.09061, beats_loss=0.01566, ecapa_loss=0.0002202, whisper_loss=0.07276, over 23475.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01285, ecapa_loss=0.0003267, whisper_loss=0.1011, over 3915927.32 frames. ], batch size: 91, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:11:50,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.925e+01 3.324e+01 4.014e+01 6.320e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-09 22:12:02,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=224080.0, ans=0.1 2024-08-09 22:12:22,632 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 22:12:24,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=224180.0, ans=0.2 2024-08-09 22:12:24,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=224180.0, ans=0.125 2024-08-09 22:12:27,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=224180.0, ans=0.1 2024-08-09 22:12:29,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=224180.0, ans=0.2 2024-08-09 22:12:38,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=224280.0, ans=0.0 2024-08-09 22:12:51,051 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 22:13:01,765 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-09 22:13:05,490 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 22:13:06,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 7950, loss[loss=0.1075, beats_loss=0.01265, ecapa_loss=0.0003224, whisper_loss=0.09162, over 18853.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01276, ecapa_loss=0.0003258, whisper_loss=0.1012, over 3909552.62 frames. ], batch size: 73, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:13:33,337 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 22:13:55,611 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 22:14:02,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-09 22:14:05,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=224880.0, ans=0.2 2024-08-09 22:14:15,045 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-09 22:14:19,487 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 22:14:20,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8000, loss[loss=0.1086, beats_loss=0.01521, ecapa_loss=0.000318, whisper_loss=0.09017, over 22226.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01268, ecapa_loss=0.0003253, whisper_loss=0.101, over 3933184.11 frames. ], batch size: 90, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:14:23,863 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 3.124e+01 3.387e+01 3.961e+01 6.094e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-09 22:14:26,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=224980.0, ans=0.0 2024-08-09 22:14:30,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=224980.0, ans=0.0 2024-08-09 22:14:35,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=225080.0, ans=0.09899494936611666 2024-08-09 22:14:45,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=225080.0, ans=0.04949747468305833 2024-08-09 22:14:48,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=225080.0, ans=0.0 2024-08-09 22:14:55,038 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 22:15:10,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=225280.0, ans=0.125 2024-08-09 22:15:35,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8050, loss[loss=0.1225, beats_loss=0.0131, ecapa_loss=0.0004018, whisper_loss=0.1054, over 21907.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01261, ecapa_loss=0.0003255, whisper_loss=0.1014, over 3926184.39 frames. ], batch size: 92, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:15:35,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=225480.0, ans=0.125 2024-08-09 22:15:57,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=225580.0, ans=0.125 2024-08-09 22:16:10,218 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 22:16:27,005 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.555e+04 2024-08-09 22:16:33,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2024-08-09 22:16:50,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8100, loss[loss=0.1135, beats_loss=0.009066, ecapa_loss=0.0004287, whisper_loss=0.1002, over 16973.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01255, ecapa_loss=0.0003249, whisper_loss=0.1012, over 3908929.40 frames. ], batch size: 71, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:16:52,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=225980.0, ans=0.05 2024-08-09 22:16:53,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.949e+01 3.347e+01 3.946e+01 6.724e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-09 22:17:07,671 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 22:17:09,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=12.0 2024-08-09 22:17:10,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2024-08-09 22:17:21,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=226180.0, ans=0.125 2024-08-09 22:17:23,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=226180.0, ans=0.125 2024-08-09 22:17:28,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=226180.0, ans=0.95 2024-08-09 22:17:39,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=226280.0, ans=0.0 2024-08-09 22:17:44,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-08-09 22:18:05,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8150, loss[loss=0.08496, beats_loss=0.01241, ecapa_loss=0.0003001, whisper_loss=0.06955, over 14516.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01257, ecapa_loss=0.000326, whisper_loss=0.1008, over 3888015.33 frames. ], batch size: 57, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:18:21,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226580.0, ans=0.1 2024-08-09 22:18:23,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2024-08-09 22:18:30,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=226580.0, ans=0.2 2024-08-09 22:18:48,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226680.0, ans=0.1 2024-08-09 22:19:12,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=226880.0, ans=0.125 2024-08-09 22:19:23,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8200, loss[loss=0.08885, beats_loss=0.01072, ecapa_loss=0.0003605, whisper_loss=0.07453, over 21093.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01255, ecapa_loss=0.0003265, whisper_loss=0.1015, over 3925709.95 frames. ], batch size: 83, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:19:24,809 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 22:19:25,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 3.072e+01 3.518e+01 4.235e+01 6.207e+01, threshold=7.036e+01, percent-clipped=0.0 2024-08-09 22:19:37,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=227080.0, ans=0.125 2024-08-09 22:19:39,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=227080.0, ans=0.0 2024-08-09 22:19:47,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227080.0, ans=0.125 2024-08-09 22:19:55,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=227180.0, ans=0.0 2024-08-09 22:19:58,007 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 22:19:58,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-08-09 22:19:59,479 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 11 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 22:20:09,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227280.0, ans=0.1 2024-08-09 22:20:11,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=227280.0, ans=0.0 2024-08-09 22:20:17,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=12.0 2024-08-09 22:20:20,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2024-08-09 22:20:31,776 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 22:20:33,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227380.0, ans=0.1 2024-08-09 22:20:39,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8250, loss[loss=0.1231, beats_loss=0.01306, ecapa_loss=0.000275, whisper_loss=0.1073, over 17866.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01254, ecapa_loss=0.0003278, whisper_loss=0.1018, over 3925090.39 frames. ], batch size: 68, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:21:07,357 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 12 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 22:21:53,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=227880.0, ans=0.125 2024-08-09 22:21:56,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8300, loss[loss=0.09189, beats_loss=0.01702, ecapa_loss=0.0002122, whisper_loss=0.07275, over 17335.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01266, ecapa_loss=0.0003271, whisper_loss=0.1009, over 3911172.22 frames. ], batch size: 68, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:21:56,701 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 22:21:59,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.849e+01 3.182e+01 3.709e+01 5.211e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-09 22:21:59,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=227980.0, ans=0.0 2024-08-09 22:22:03,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=227980.0, ans=0.125 2024-08-09 22:22:05,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=227980.0, ans=0.0 2024-08-09 22:22:14,753 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 22:22:15,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=228080.0, ans=0.125 2024-08-09 22:23:03,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=228380.0, ans=0.125 2024-08-09 22:23:10,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8350, loss[loss=0.1236, beats_loss=0.01052, ecapa_loss=0.0003631, whisper_loss=0.1095, over 22518.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01267, ecapa_loss=0.0003268, whisper_loss=0.101, over 3916066.86 frames. ], batch size: 93, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:23:14,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=228480.0, ans=0.05 2024-08-09 22:23:14,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-09 22:23:15,028 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 22:23:16,071 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 22:23:16,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=228480.0, ans=0.2 2024-08-09 22:23:25,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=228580.0, ans=0.125 2024-08-09 22:23:26,571 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 22:23:27,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=228580.0, ans=0.5 2024-08-09 22:23:38,817 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 22:23:42,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=228680.0, ans=0.125 2024-08-09 22:23:45,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=228680.0, ans=0.125 2024-08-09 22:23:47,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=228680.0, ans=0.09899494936611666 2024-08-09 22:24:05,843 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 29 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-09 22:24:06,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=228780.0, ans=0.125 2024-08-09 22:24:17,849 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 22:24:26,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8400, loss[loss=0.0904, beats_loss=0.01444, ecapa_loss=0.0003639, whisper_loss=0.07232, over 17673.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01255, ecapa_loss=0.0003296, whisper_loss=0.1023, over 3920520.75 frames. ], batch size: 74, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:24:29,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.962e+01 3.410e+01 4.213e+01 6.836e+01, threshold=6.819e+01, percent-clipped=3.0 2024-08-09 22:25:01,221 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 22:25:03,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=229180.0, ans=0.0 2024-08-09 22:25:04,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=229180.0, ans=0.0 2024-08-09 22:25:07,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-09 22:25:09,079 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 22:25:36,250 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 22:25:42,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8450, loss[loss=0.1166, beats_loss=0.0112, ecapa_loss=0.000318, whisper_loss=0.1022, over 20692.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01246, ecapa_loss=0.0003298, whisper_loss=0.1022, over 3877589.58 frames. ], batch size: 80, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:25:49,076 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 22:25:50,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229480.0, ans=0.1 2024-08-09 22:26:08,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=229580.0, ans=0.125 2024-08-09 22:26:12,066 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 22:26:33,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=15.0 2024-08-09 22:26:44,246 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-09 22:26:58,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8500, loss[loss=0.09765, beats_loss=0.01709, ecapa_loss=0.0002769, whisper_loss=0.0778, over 15999.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01249, ecapa_loss=0.0003311, whisper_loss=0.1016, over 3858860.12 frames. ], batch size: 62, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:27:00,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.472e+01 3.102e+01 3.448e+01 4.001e+01 5.719e+01, threshold=6.896e+01, percent-clipped=0.0 2024-08-09 22:27:12,944 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 22:27:29,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=230180.0, ans=0.0 2024-08-09 22:27:32,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=230180.0, ans=0.07 2024-08-09 22:27:34,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-08-09 22:27:44,639 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 22:27:47,536 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 39 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 22:27:47,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=230280.0, ans=0.125 2024-08-09 22:27:54,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230280.0, ans=0.1 2024-08-09 22:28:05,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230380.0, ans=0.1 2024-08-09 22:28:12,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8550, loss[loss=0.1218, beats_loss=0.01287, ecapa_loss=0.0003011, whisper_loss=0.1059, over 20661.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01242, ecapa_loss=0.0003305, whisper_loss=0.1021, over 3870947.70 frames. ], batch size: 81, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:28:13,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=22.5 2024-08-09 22:28:14,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=230480.0, ans=0.0 2024-08-09 22:28:31,874 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-09 22:28:47,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=230680.0, ans=0.125 2024-08-09 22:28:58,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=230780.0, ans=0.125 2024-08-09 22:29:02,086 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 22:29:06,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=230780.0, ans=0.02 2024-08-09 22:29:08,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=230780.0, ans=0.125 2024-08-09 22:29:09,137 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-09 22:29:18,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=230880.0, ans=0.125 2024-08-09 22:29:24,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=230880.0, ans=0.5 2024-08-09 22:29:26,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8600, loss[loss=0.1402, beats_loss=0.00956, ecapa_loss=0.0004235, whisper_loss=0.1264, over 14418.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01243, ecapa_loss=0.0003306, whisper_loss=0.1017, over 3847122.82 frames. ], batch size: 60, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:29:27,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-09 22:29:29,297 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.896e+01 3.419e+01 4.251e+01 8.504e+01, threshold=6.839e+01, percent-clipped=1.0 2024-08-09 22:29:35,828 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.007e+00 2024-08-09 22:29:36,967 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 22:29:42,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231080.0, ans=0.125 2024-08-09 22:29:56,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=231180.0, ans=0.2 2024-08-09 22:30:05,218 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 22:30:09,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=231280.0, ans=0.125 2024-08-09 22:30:36,854 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 22:30:38,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231480.0, ans=0.1 2024-08-09 22:30:39,374 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8650, loss[loss=0.1327, beats_loss=0.01132, ecapa_loss=0.0003569, whisper_loss=0.1178, over 21484.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01247, ecapa_loss=0.0003304, whisper_loss=0.1015, over 3842381.33 frames. ], batch size: 86, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:30:57,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=231580.0, ans=0.125 2024-08-09 22:31:00,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2024-08-09 22:31:19,641 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 22:31:22,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=231780.0, ans=0.95 2024-08-09 22:31:27,906 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 22:31:28,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231780.0, ans=0.1 2024-08-09 22:31:50,618 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 22:31:51,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8700, loss[loss=0.1248, beats_loss=0.01039, ecapa_loss=0.0003577, whisper_loss=0.1108, over 20235.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01249, ecapa_loss=0.0003299, whisper_loss=0.1015, over 3828328.74 frames. ], batch size: 82, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:31:54,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 3.097e+01 3.569e+01 4.188e+01 5.734e+01, threshold=7.139e+01, percent-clipped=0.0 2024-08-09 22:32:12,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232080.0, ans=0.1 2024-08-09 22:32:39,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.72 vs. limit=15.0 2024-08-09 22:32:41,942 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 22:32:44,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=232280.0, ans=0.125 2024-08-09 22:32:47,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=12.0 2024-08-09 22:32:52,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=232380.0, ans=0.125 2024-08-09 22:33:07,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8750, loss[loss=0.1361, beats_loss=0.01169, ecapa_loss=0.0003765, whisper_loss=0.1206, over 17163.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01249, ecapa_loss=0.0003289, whisper_loss=0.1019, over 3850877.35 frames. ], batch size: 67, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:33:26,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2024-08-09 22:33:33,711 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 22:33:39,504 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 22:33:51,139 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-09 22:33:51,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-09 22:34:02,893 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 22:34:19,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8800, loss[loss=0.1213, beats_loss=0.01301, ecapa_loss=0.0003784, whisper_loss=0.1045, over 22234.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01254, ecapa_loss=0.0003273, whisper_loss=0.1013, over 3840447.51 frames. ], batch size: 91, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:34:23,306 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 3.102e+01 3.612e+01 4.206e+01 6.577e+01, threshold=7.224e+01, percent-clipped=0.0 2024-08-09 22:34:36,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233080.0, ans=0.1 2024-08-09 22:34:43,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-09 22:34:54,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=233180.0, ans=0.125 2024-08-09 22:34:56,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=233180.0, ans=0.125 2024-08-09 22:35:14,369 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 22:35:19,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=233380.0, ans=0.0 2024-08-09 22:35:34,814 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8850, loss[loss=0.09501, beats_loss=0.01067, ecapa_loss=0.0004318, whisper_loss=0.08002, over 18410.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0126, ecapa_loss=0.0003252, whisper_loss=0.101, over 3869755.82 frames. ], batch size: 75, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:35:43,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=233480.0, ans=0.125 2024-08-09 22:35:46,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=233480.0, ans=0.0 2024-08-09 22:35:50,732 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 22:35:53,498 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-09 22:36:02,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2024-08-09 22:36:07,694 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 22:36:17,619 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 22:36:17,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=233780.0, ans=0.125 2024-08-09 22:36:20,025 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 22:36:20,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-08-09 22:36:24,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=233780.0, ans=0.125 2024-08-09 22:36:38,468 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 22:36:41,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-08-09 22:36:45,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8900, loss[loss=0.09476, beats_loss=0.01597, ecapa_loss=0.0002603, whisper_loss=0.07619, over 16449.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.0126, ecapa_loss=0.000323, whisper_loss=0.1004, over 3824714.63 frames. ], batch size: 64, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:36:47,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.807e+01 3.249e+01 3.699e+01 6.208e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-09 22:36:48,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=233980.0, ans=0.125 2024-08-09 22:36:53,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=233980.0, ans=0.0 2024-08-09 22:36:57,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2024-08-09 22:36:58,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234080.0, ans=0.1 2024-08-09 22:37:24,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-09 22:37:25,646 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 22:37:29,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=234280.0, ans=0.0 2024-08-09 22:37:47,493 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 22:37:55,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-09 22:37:56,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 8950, loss[loss=0.1393, beats_loss=0.01033, ecapa_loss=0.0003612, whisper_loss=0.1253, over 22386.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01265, ecapa_loss=0.0003211, whisper_loss=0.1003, over 3857743.02 frames. ], batch size: 88, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:38:11,480 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 22:38:19,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=234580.0, ans=0.2 2024-08-09 22:38:21,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.08 vs. limit=10.0 2024-08-09 22:38:36,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.08 vs. limit=22.5 2024-08-09 22:38:44,001 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 22:38:53,484 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 22:39:03,883 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 22:39:04,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=234980.0, ans=0.125 2024-08-09 22:39:04,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9000, loss[loss=0.1285, beats_loss=0.01152, ecapa_loss=0.0003549, whisper_loss=0.1134, over 21545.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01274, ecapa_loss=0.000322, whisper_loss=0.1003, over 3905954.82 frames. ], batch size: 87, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:39:04,960 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 22:39:43,723 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on ASR_libri: loss=0.2806, beats_loss=0, ecapa_loss=0.0009572, whisper_loss=0.2711, over 922467.00 frames. 2024-08-09 22:40:01,236 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on SV_voxceleb1: loss=0.008746, beats_loss=0, ecapa_loss=0.0008746, whisper_loss=0, over 939242.00 frames. 2024-08-09 22:41:51,902 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on AT_audioset: loss=0.02976, beats_loss=0.02976, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 22:41:51,906 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-09 22:41:54,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 3.054e+01 3.477e+01 3.947e+01 5.844e+01, threshold=6.953e+01, percent-clipped=0.0 2024-08-09 22:41:58,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=234980.0, ans=0.125 2024-08-09 22:42:00,138 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-09 22:42:06,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=15.0 2024-08-09 22:42:11,776 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 22:42:20,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=235180.0, ans=0.125 2024-08-09 22:42:42,630 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-09 22:42:54,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-08-09 22:42:58,728 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 22:43:04,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9050, loss[loss=0.09978, beats_loss=0.01293, ecapa_loss=0.0003887, whisper_loss=0.08296, over 21625.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01262, ecapa_loss=0.000322, whisper_loss=0.1009, over 3902748.88 frames. ], batch size: 92, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:43:40,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=235680.0, ans=0.2 2024-08-09 22:43:51,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.09 vs. limit=10.0 2024-08-09 22:43:51,960 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 22:43:55,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=235780.0, ans=0.0 2024-08-09 22:43:58,822 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 12 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-09 22:43:59,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=235780.0, ans=0.05 2024-08-09 22:44:17,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9100, loss[loss=0.1242, beats_loss=0.01286, ecapa_loss=0.0003141, whisper_loss=0.1082, over 18996.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01266, ecapa_loss=0.0003234, whisper_loss=0.1002, over 3874810.66 frames. ], batch size: 77, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:44:20,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.942e+01 3.415e+01 3.847e+01 6.703e+01, threshold=6.829e+01, percent-clipped=0.0 2024-08-09 22:44:32,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=236080.0, ans=0.0 2024-08-09 22:44:37,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=236080.0, ans=10.0 2024-08-09 22:44:40,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=236080.0, ans=15.0 2024-08-09 22:44:52,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-09 22:44:58,304 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 22:45:11,939 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 22:45:29,639 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 22:45:29,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=236380.0, ans=0.125 2024-08-09 22:45:34,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9150, loss[loss=0.0996, beats_loss=0.01407, ecapa_loss=0.000264, whisper_loss=0.08289, over 21398.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01266, ecapa_loss=0.0003247, whisper_loss=0.09971, over 3877877.26 frames. ], batch size: 87, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:45:55,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=236580.0, ans=0.1 2024-08-09 22:46:14,742 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 22:46:28,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236780.0, ans=0.1 2024-08-09 22:46:40,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=236880.0, ans=0.0 2024-08-09 22:46:43,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236880.0, ans=0.1 2024-08-09 22:46:47,719 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.622e-01 2024-08-09 22:46:48,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9200, loss[loss=0.08519, beats_loss=0.01316, ecapa_loss=0.0004704, whisper_loss=0.06733, over 15416.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01268, ecapa_loss=0.0003257, whisper_loss=0.09953, over 3876777.00 frames. ], batch size: 70, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:46:51,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.835e+01 3.303e+01 3.887e+01 6.132e+01, threshold=6.605e+01, percent-clipped=0.0 2024-08-09 22:46:52,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=236980.0, ans=0.5 2024-08-09 22:46:56,355 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 22:46:58,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=236980.0, ans=0.125 2024-08-09 22:47:07,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=237080.0, ans=0.125 2024-08-09 22:47:19,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=237180.0, ans=0.125 2024-08-09 22:47:29,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=237180.0, ans=0.125 2024-08-09 22:47:40,917 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 22:47:48,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=237380.0, ans=0.125 2024-08-09 22:48:04,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9250, loss[loss=0.1245, beats_loss=0.01094, ecapa_loss=0.000411, whisper_loss=0.1094, over 18738.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01275, ecapa_loss=0.0003248, whisper_loss=0.1004, over 3910838.58 frames. ], batch size: 78, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:48:05,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-09 22:48:31,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=237580.0, ans=0.125 2024-08-09 22:48:34,667 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 22:48:38,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=237680.0, ans=0.0 2024-08-09 22:48:41,300 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 22:48:41,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.76 vs. limit=10.0 2024-08-09 22:48:44,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=237680.0, ans=0.0 2024-08-09 22:48:53,606 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 30 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 22:48:58,411 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 22:49:06,590 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 22:49:24,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9300, loss[loss=0.1146, beats_loss=0.01531, ecapa_loss=0.0002626, whisper_loss=0.09671, over 19532.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01263, ecapa_loss=0.0003263, whisper_loss=0.1014, over 3928864.80 frames. ], batch size: 76, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:49:27,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 3.042e+01 3.380e+01 4.213e+01 8.159e+01, threshold=6.761e+01, percent-clipped=3.0 2024-08-09 22:49:42,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=238080.0, ans=0.125 2024-08-09 22:49:43,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=238080.0, ans=0.2 2024-08-09 22:49:44,578 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 22:49:46,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=238080.0, ans=0.125 2024-08-09 22:49:54,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=238180.0, ans=0.2 2024-08-09 22:50:06,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238180.0, ans=0.1 2024-08-09 22:50:09,590 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 22:50:25,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238380.0, ans=0.1 2024-08-09 22:50:41,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9350, loss[loss=0.09984, beats_loss=0.0119, ecapa_loss=0.0003201, whisper_loss=0.08475, over 17911.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01265, ecapa_loss=0.000327, whisper_loss=0.1011, over 3932247.21 frames. ], batch size: 73, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:50:43,394 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 22:51:02,081 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 21 from LS+wenet, 27 from Vox, 48 fro AS 2024-08-09 22:51:02,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=238580.0, ans=0.125 2024-08-09 22:51:05,860 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 22:52:02,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238780.0, ans=0.125 2024-08-09 22:52:21,503 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.022e+03 2024-08-09 22:52:31,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2024-08-09 22:52:34,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9400, loss[loss=0.122, beats_loss=0.01213, ecapa_loss=0.0004213, whisper_loss=0.1057, over 21268.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01261, ecapa_loss=0.000327, whisper_loss=0.1005, over 3888142.64 frames. ], batch size: 91, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:52:37,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.975e+01 3.274e+01 3.809e+01 6.351e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-09 22:52:41,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=238980.0, ans=0.125 2024-08-09 22:52:42,821 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 22:53:22,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=239180.0, ans=0.0 2024-08-09 22:53:24,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=239280.0, ans=0.0 2024-08-09 22:53:25,828 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-09 22:53:43,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=239280.0, ans=0.0 2024-08-09 22:53:43,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=12.0 2024-08-09 22:54:02,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=239380.0, ans=0.125 2024-08-09 22:54:04,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=239380.0, ans=0.0 2024-08-09 22:54:06,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9450, loss[loss=0.1069, beats_loss=0.01372, ecapa_loss=0.0003192, whisper_loss=0.09001, over 16831.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.0126, ecapa_loss=0.000328, whisper_loss=0.1003, over 3897137.89 frames. ], batch size: 69, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:54:27,375 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 22:54:32,380 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 22:54:36,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=239580.0, ans=0.0 2024-08-09 22:54:43,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=239580.0, ans=0.125 2024-08-09 22:54:51,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=239680.0, ans=0.0 2024-08-09 22:55:46,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239880.0, ans=0.1 2024-08-09 22:55:51,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9500, loss[loss=0.1422, beats_loss=0.0138, ecapa_loss=0.0003045, whisper_loss=0.1254, over 23213.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01265, ecapa_loss=0.0003277, whisper_loss=0.1004, over 3909541.41 frames. ], batch size: 91, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:55:59,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.955e+01 3.513e+01 3.972e+01 7.065e+01, threshold=7.026e+01, percent-clipped=1.0 2024-08-09 22:56:23,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=240080.0, ans=0.02 2024-08-09 22:56:34,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=240080.0, ans=0.125 2024-08-09 22:56:41,534 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 22:56:44,842 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 22:56:45,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=240180.0, ans=0.2 2024-08-09 22:56:47,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=240180.0, ans=0.0 2024-08-09 22:56:49,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=240180.0, ans=0.125 2024-08-09 22:56:55,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=240180.0, ans=0.125 2024-08-09 22:57:13,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240280.0, ans=0.1 2024-08-09 22:57:30,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=240380.0, ans=0.07 2024-08-09 22:57:34,548 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 22:57:34,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=240380.0, ans=0.0 2024-08-09 22:57:39,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2024-08-09 22:57:50,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9550, loss[loss=0.1209, beats_loss=0.01216, ecapa_loss=0.0002553, whisper_loss=0.1062, over 19517.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01269, ecapa_loss=0.0003271, whisper_loss=0.1003, over 3877399.42 frames. ], batch size: 74, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:57:58,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=240480.0, ans=0.125 2024-08-09 22:58:12,093 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 22:58:40,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=240680.0, ans=0.0 2024-08-09 22:58:42,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=240680.0, ans=0.07 2024-08-09 22:59:11,100 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 22:59:24,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=240880.0, ans=0.125 2024-08-09 22:59:38,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=240880.0, ans=0.0 2024-08-09 22:59:46,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9600, loss[loss=0.1018, beats_loss=0.009818, ecapa_loss=0.0004052, whisper_loss=0.08789, over 13242.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01265, ecapa_loss=0.0003252, whisper_loss=0.09958, over 3834619.78 frames. ], batch size: 54, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:59:47,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=240980.0, ans=0.035 2024-08-09 22:59:49,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.841e+01 3.249e+01 3.780e+01 5.366e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-09 23:00:16,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2024-08-09 23:00:20,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=241080.0, ans=0.0 2024-08-09 23:00:25,426 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 23:00:31,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2024-08-09 23:00:54,089 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-09 23:00:57,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=241280.0, ans=0.09899494936611666 2024-08-09 23:01:06,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=241280.0, ans=0.125 2024-08-09 23:01:11,736 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 23:01:17,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.14 vs. limit=10.0 2024-08-09 23:01:19,772 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 23:01:33,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9650, loss[loss=0.1398, beats_loss=0.01023, ecapa_loss=0.0003622, whisper_loss=0.1259, over 23099.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01268, ecapa_loss=0.000327, whisper_loss=0.09994, over 3876622.65 frames. ], batch size: 90, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:02:00,208 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 35 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 23:02:02,221 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 23:02:48,573 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-09 23:02:58,016 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9700, loss[loss=0.1239, beats_loss=0.01294, ecapa_loss=0.0002668, whisper_loss=0.1083, over 24238.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01267, ecapa_loss=0.0003268, whisper_loss=0.1003, over 3860107.01 frames. ], batch size: 91, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:03:01,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 3.064e+01 3.484e+01 4.019e+01 6.587e+01, threshold=6.968e+01, percent-clipped=2.0 2024-08-09 23:03:16,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=242080.0, ans=0.0 2024-08-09 23:03:38,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=242180.0, ans=0.125 2024-08-09 23:03:41,172 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 23:03:48,133 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-09 23:04:21,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9750, loss[loss=0.1137, beats_loss=0.0114, ecapa_loss=0.0003434, whisper_loss=0.09881, over 14912.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01259, ecapa_loss=0.0003265, whisper_loss=0.1007, over 3882499.07 frames. ], batch size: 59, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:04:25,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2024-08-09 23:04:30,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=242480.0, ans=0.0 2024-08-09 23:04:30,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=242480.0, ans=0.125 2024-08-09 23:05:07,133 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 23:05:17,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-09 23:05:19,296 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 23:05:39,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=242880.0, ans=0.5 2024-08-09 23:05:39,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=17.42 vs. limit=15.0 2024-08-09 23:05:41,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9800, loss[loss=0.09758, beats_loss=0.01721, ecapa_loss=0.0002445, whisper_loss=0.07792, over 16857.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01264, ecapa_loss=0.0003245, whisper_loss=0.09979, over 3844116.11 frames. ], batch size: 65, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:05:43,254 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 8 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-09 23:05:44,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.875e+01 3.358e+01 3.972e+01 6.084e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-09 23:05:57,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=243080.0, ans=0.0 2024-08-09 23:06:14,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=243180.0, ans=0.0 2024-08-09 23:06:15,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-08-09 23:06:25,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=243180.0, ans=0.5 2024-08-09 23:06:29,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243180.0, ans=0.1 2024-08-09 23:06:32,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243280.0, ans=0.1 2024-08-09 23:06:49,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=243380.0, ans=0.125 2024-08-09 23:06:56,197 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-09 23:07:01,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=12.0 2024-08-09 23:07:02,609 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 23:07:05,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9850, loss[loss=0.154, beats_loss=0.006128, ecapa_loss=0.0003102, whisper_loss=0.1447, over 17382.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01248, ecapa_loss=0.0003242, whisper_loss=0.1014, over 3861105.24 frames. ], batch size: 61, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:07:11,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=243480.0, ans=0.2 2024-08-09 23:07:14,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=243480.0, ans=0.125 2024-08-09 23:07:27,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=243580.0, ans=0.0 2024-08-09 23:07:39,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=243680.0, ans=0.0 2024-08-09 23:07:45,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=243680.0, ans=0.125 2024-08-09 23:07:55,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=243680.0, ans=0.0 2024-08-09 23:08:18,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-08-09 23:08:33,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9900, loss[loss=0.1251, beats_loss=0.01125, ecapa_loss=0.0003342, whisper_loss=0.1105, over 21101.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01256, ecapa_loss=0.0003223, whisper_loss=0.1018, over 3919077.19 frames. ], batch size: 84, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:08:35,400 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 23:08:35,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=243980.0, ans=0.0 2024-08-09 23:08:36,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 3.025e+01 3.445e+01 3.906e+01 6.336e+01, threshold=6.890e+01, percent-clipped=0.0 2024-08-09 23:08:46,569 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 23:08:51,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-09 23:08:52,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=244080.0, ans=0.04949747468305833 2024-08-09 23:09:05,949 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 23:09:19,892 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 23:09:30,777 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-09 23:09:33,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=244280.0, ans=0.125 2024-08-09 23:09:36,562 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.596e-01 2024-08-09 23:09:55,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 9950, loss[loss=0.118, beats_loss=0.01265, ecapa_loss=0.0003795, whisper_loss=0.1015, over 21634.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01257, ecapa_loss=0.0003213, whisper_loss=0.1016, over 3908137.45 frames. ], batch size: 90, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:10:01,251 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 23:10:08,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-08-09 23:10:14,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=244580.0, ans=0.125 2024-08-09 23:10:15,792 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 23:10:49,450 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 23:10:50,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2024-08-09 23:10:56,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=244780.0, ans=0.0 2024-08-09 23:11:06,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-09 23:11:17,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=244980.0, ans=0.0 2024-08-09 23:11:18,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10000, loss[loss=0.1114, beats_loss=0.01504, ecapa_loss=0.0003028, whisper_loss=0.09338, over 21486.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01262, ecapa_loss=0.0003219, whisper_loss=0.1017, over 3916223.10 frames. ], batch size: 86, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:11:22,324 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.876e+01 3.207e+01 3.745e+01 5.513e+01, threshold=6.413e+01, percent-clipped=0.0 2024-08-09 23:11:29,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244980.0, ans=0.1 2024-08-09 23:11:36,264 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 23:11:44,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=245080.0, ans=0.2 2024-08-09 23:11:49,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=22.5 2024-08-09 23:12:06,626 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 23:12:14,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=245280.0, ans=0.125 2024-08-09 23:12:19,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245280.0, ans=0.125 2024-08-09 23:12:33,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=245380.0, ans=0.125 2024-08-09 23:12:38,768 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-09 23:12:43,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=245380.0, ans=0.2 2024-08-09 23:12:50,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10050, loss[loss=0.1122, beats_loss=0.01379, ecapa_loss=0.0003535, whisper_loss=0.09483, over 22976.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01262, ecapa_loss=0.0003213, whisper_loss=0.102, over 3905196.27 frames. ], batch size: 94, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:12:56,645 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 23:13:10,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=245580.0, ans=0.0 2024-08-09 23:13:45,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245680.0, ans=0.1 2024-08-09 23:13:56,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=245780.0, ans=0.125 2024-08-09 23:13:58,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=245780.0, ans=0.0 2024-08-09 23:14:10,003 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 23:14:10,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=245880.0, ans=0.125 2024-08-09 23:14:19,550 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 23:14:24,039 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10100, loss[loss=0.114, beats_loss=0.01388, ecapa_loss=0.0003614, whisper_loss=0.09654, over 21777.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.0126, ecapa_loss=0.0003219, whisper_loss=0.1023, over 3949266.76 frames. ], batch size: 94, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:14:25,122 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 23:14:25,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=245980.0, ans=0.125 2024-08-09 23:14:28,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.998e+01 3.344e+01 3.820e+01 6.746e+01, threshold=6.687e+01, percent-clipped=3.0 2024-08-09 23:14:29,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=245980.0, ans=0.1 2024-08-09 23:14:30,208 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 23:14:31,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=245980.0, ans=0.0 2024-08-09 23:14:36,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-09 23:14:41,648 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 23:14:50,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=246080.0, ans=0.0 2024-08-09 23:14:56,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=246180.0, ans=0.1 2024-08-09 23:15:12,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=246280.0, ans=0.0 2024-08-09 23:15:22,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=246280.0, ans=0.09899494936611666 2024-08-09 23:15:24,954 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 23:15:26,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=246280.0, ans=0.125 2024-08-09 23:15:36,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246380.0, ans=0.1 2024-08-09 23:15:43,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10150, loss[loss=0.1376, beats_loss=0.01128, ecapa_loss=0.0002574, whisper_loss=0.1237, over 17804.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01253, ecapa_loss=0.0003234, whisper_loss=0.1021, over 3954814.66 frames. ], batch size: 65, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:15:43,517 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 23:15:50,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=246480.0, ans=0.0 2024-08-09 23:15:53,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=246480.0, ans=0.125 2024-08-09 23:15:58,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=246580.0, ans=0.125 2024-08-09 23:16:07,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=246580.0, ans=0.04949747468305833 2024-08-09 23:16:12,632 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 23:16:38,862 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-09 23:16:47,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.94 vs. limit=10.0 2024-08-09 23:16:47,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=22.5 2024-08-09 23:16:48,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=246880.0, ans=0.07 2024-08-09 23:16:57,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10200, loss[loss=0.1212, beats_loss=0.01281, ecapa_loss=0.0003256, whisper_loss=0.1052, over 18681.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01249, ecapa_loss=0.0003259, whisper_loss=0.102, over 3925201.02 frames. ], batch size: 73, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:16:58,122 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-09 23:17:00,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.913e+01 3.327e+01 3.843e+01 5.703e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 23:17:02,231 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 23:17:04,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=246980.0, ans=0.125 2024-08-09 23:17:23,926 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 23:17:41,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=247280.0, ans=0.0 2024-08-09 23:17:47,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=247280.0, ans=0.125 2024-08-09 23:18:02,902 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 23:18:03,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=247380.0, ans=0.0 2024-08-09 23:18:06,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=247380.0, ans=10.0 2024-08-09 23:18:09,936 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10250, loss[loss=0.1299, beats_loss=0.008612, ecapa_loss=0.0003348, whisper_loss=0.1179, over 23615.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01243, ecapa_loss=0.0003246, whisper_loss=0.102, over 3917342.05 frames. ], batch size: 91, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:18:22,675 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 23:18:30,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=247580.0, ans=0.5 2024-08-09 23:18:48,242 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 23:18:59,313 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 23:19:14,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=247880.0, ans=0.125 2024-08-09 23:19:17,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=247880.0, ans=0.0 2024-08-09 23:19:21,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10300, loss[loss=0.1078, beats_loss=0.01269, ecapa_loss=0.0002503, whisper_loss=0.09266, over 22996.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01251, ecapa_loss=0.000322, whisper_loss=0.102, over 3931957.87 frames. ], batch size: 89, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:19:22,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2024-08-09 23:19:25,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.179e+01 3.546e+01 4.118e+01 7.373e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-09 23:19:38,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=248080.0, ans=0.125 2024-08-09 23:19:47,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=248080.0, ans=0.125 2024-08-09 23:19:52,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2024-08-09 23:20:04,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248280.0, ans=0.1 2024-08-09 23:20:05,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=248280.0, ans=0.0 2024-08-09 23:20:17,447 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-09 23:20:28,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-09 23:20:32,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2024-08-09 23:20:34,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10350, loss[loss=0.1156, beats_loss=0.01254, ecapa_loss=0.0002914, whisper_loss=0.1001, over 18251.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01253, ecapa_loss=0.0003224, whisper_loss=0.102, over 3913236.78 frames. ], batch size: 74, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:20:36,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=248480.0, ans=0.0 2024-08-09 23:20:45,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=248480.0, ans=0.025 2024-08-09 23:20:47,776 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 23:20:55,867 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 23:21:02,697 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-09 23:21:11,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=248680.0, ans=0.125 2024-08-09 23:21:22,215 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 23:21:46,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10400, loss[loss=0.1092, beats_loss=0.01203, ecapa_loss=0.0003481, whisper_loss=0.09371, over 18031.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01251, ecapa_loss=0.0003222, whisper_loss=0.1014, over 3904332.41 frames. ], batch size: 74, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:21:48,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 2.757e+01 3.226e+01 3.794e+01 6.112e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-09 23:21:53,683 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 23:21:56,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=248980.0, ans=0.0 2024-08-09 23:22:08,013 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 23:22:10,609 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-09 23:22:19,162 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-09 23:22:19,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=249180.0, ans=0.05 2024-08-09 23:22:35,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=249280.0, ans=0.125 2024-08-09 23:22:45,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2024-08-09 23:22:47,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=249380.0, ans=0.125 2024-08-09 23:22:54,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10450, loss[loss=0.1152, beats_loss=0.0117, ecapa_loss=0.000363, whisper_loss=0.09988, over 16671.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01258, ecapa_loss=0.0003241, whisper_loss=0.1012, over 3893125.02 frames. ], batch size: 69, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:23:00,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249480.0, ans=0.1 2024-08-09 23:23:05,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2024-08-09 23:23:06,005 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 18 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 23:23:07,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=249580.0, ans=0.025 2024-08-09 23:23:19,715 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 23:23:20,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=249580.0, ans=0.0 2024-08-09 23:23:41,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=249780.0, ans=0.125 2024-08-09 23:23:49,390 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-09 23:23:53,571 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 23:23:59,746 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 23:24:00,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=249880.0, ans=0.07 2024-08-09 23:24:02,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10500, loss[loss=0.1063, beats_loss=0.01565, ecapa_loss=0.0003094, whisper_loss=0.0876, over 13885.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01264, ecapa_loss=0.0003221, whisper_loss=0.1002, over 3898288.66 frames. ], batch size: 55, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:24:02,943 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-09 23:24:05,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.948e+01 3.458e+01 4.084e+01 6.883e+01, threshold=6.915e+01, percent-clipped=1.0 2024-08-09 23:24:10,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=249980.0, ans=0.0 2024-08-09 23:24:17,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=250080.0, ans=0.2 2024-08-09 23:24:29,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=250180.0, ans=0.125 2024-08-09 23:24:32,779 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 23:24:41,622 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.110e+03 2024-08-09 23:24:55,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=250280.0, ans=0.125 2024-08-09 23:25:05,379 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 23:25:09,419 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 23:25:13,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10550, loss[loss=0.1213, beats_loss=0.01133, ecapa_loss=0.000341, whisper_loss=0.1065, over 22831.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01253, ecapa_loss=0.0003228, whisper_loss=0.1008, over 3865936.08 frames. ], batch size: 91, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:25:21,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=250480.0, ans=0.125 2024-08-09 23:25:30,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=250580.0, ans=0.0 2024-08-09 23:25:48,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250680.0, ans=0.125 2024-08-09 23:26:02,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=250780.0, ans=0.0 2024-08-09 23:26:20,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=250880.0, ans=0.125 2024-08-09 23:26:22,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=250980.0, ans=0.0 2024-08-09 23:26:22,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10600, loss[loss=0.09457, beats_loss=0.01481, ecapa_loss=0.0002739, whisper_loss=0.07702, over 20153.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.0125, ecapa_loss=0.0003262, whisper_loss=0.101, over 3901458.12 frames. ], batch size: 81, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:26:25,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 3.120e+01 3.519e+01 3.971e+01 7.530e+01, threshold=7.037e+01, percent-clipped=1.0 2024-08-09 23:26:27,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=250980.0, ans=0.2 2024-08-09 23:26:34,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=250980.0, ans=0.125 2024-08-09 23:26:44,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-09 23:27:09,334 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-09 23:27:26,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=251380.0, ans=0.0 2024-08-09 23:27:32,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10650, loss[loss=0.1301, beats_loss=0.01022, ecapa_loss=0.0002734, whisper_loss=0.1171, over 15327.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0125, ecapa_loss=0.0003226, whisper_loss=0.1011, over 3892512.54 frames. ], batch size: 57, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:28:24,881 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 23:28:41,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10700, loss[loss=0.1312, beats_loss=0.009439, ecapa_loss=0.000305, whisper_loss=0.1187, over 15989.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01254, ecapa_loss=0.0003191, whisper_loss=0.1012, over 3880905.17 frames. ], batch size: 61, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:28:44,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.278e+01 2.878e+01 3.295e+01 3.921e+01 5.869e+01, threshold=6.590e+01, percent-clipped=0.0 2024-08-09 23:29:23,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=252280.0, ans=0.2 2024-08-09 23:29:31,794 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 23:29:51,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10750, loss[loss=0.105, beats_loss=0.01262, ecapa_loss=0.0003747, whisper_loss=0.08859, over 20870.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01252, ecapa_loss=0.0003195, whisper_loss=0.1018, over 3892277.13 frames. ], batch size: 88, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:29:54,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=252480.0, ans=0.125 2024-08-09 23:29:54,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=252480.0, ans=0.125 2024-08-09 23:29:59,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=252480.0, ans=0.1 2024-08-09 23:30:05,647 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 23:30:10,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=252580.0, ans=0.09899494936611666 2024-08-09 23:30:11,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=252580.0, ans=0.2 2024-08-09 23:30:11,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.0 2024-08-09 23:30:27,931 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 23:30:29,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252680.0, ans=0.1 2024-08-09 23:30:38,438 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 23:30:42,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=252780.0, ans=15.0 2024-08-09 23:30:48,316 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 23:30:48,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=252880.0, ans=0.125 2024-08-09 23:30:50,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-08-09 23:30:54,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=252880.0, ans=0.125 2024-08-09 23:31:00,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10800, loss[loss=0.1298, beats_loss=0.01563, ecapa_loss=0.0002667, whisper_loss=0.1115, over 21641.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01243, ecapa_loss=0.0003192, whisper_loss=0.1026, over 3879686.44 frames. ], batch size: 86, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:31:03,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.032e+01 3.349e+01 3.769e+01 6.080e+01, threshold=6.698e+01, percent-clipped=0.0 2024-08-09 23:31:19,314 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 23:31:35,984 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 23:31:36,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-08-09 23:32:06,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=253480.0, ans=0.025 2024-08-09 23:32:07,620 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10850, loss[loss=0.09922, beats_loss=0.01137, ecapa_loss=0.0003613, whisper_loss=0.08424, over 18816.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01246, ecapa_loss=0.0003199, whisper_loss=0.1022, over 3902583.02 frames. ], batch size: 77, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:32:20,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=253580.0, ans=15.0 2024-08-09 23:32:24,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253580.0, ans=0.1 2024-08-09 23:32:27,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253580.0, ans=0.1 2024-08-09 23:32:28,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=253580.0, ans=10.0 2024-08-09 23:32:34,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2024-08-09 23:32:45,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-08-09 23:32:45,949 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 23:32:46,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=22.5 2024-08-09 23:33:06,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253880.0, ans=0.1 2024-08-09 23:33:15,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10900, loss[loss=0.1363, beats_loss=0.01059, ecapa_loss=0.0003065, whisper_loss=0.1227, over 15221.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01237, ecapa_loss=0.0003205, whisper_loss=0.1029, over 3922897.17 frames. ], batch size: 56, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:33:18,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.959e+01 3.403e+01 3.969e+01 5.664e+01, threshold=6.807e+01, percent-clipped=0.0 2024-08-09 23:33:24,225 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 23:34:16,480 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 29 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-09 23:34:22,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 10950, loss[loss=0.1168, beats_loss=0.01157, ecapa_loss=0.0003384, whisper_loss=0.1019, over 21332.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01232, ecapa_loss=0.0003183, whisper_loss=0.103, over 3914267.42 frames. ], batch size: 85, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:34:29,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=254480.0, ans=0.0 2024-08-09 23:34:47,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=254580.0, ans=0.125 2024-08-09 23:34:56,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=254680.0, ans=0.0 2024-08-09 23:35:04,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=254780.0, ans=0.125 2024-08-09 23:35:05,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=254780.0, ans=0.07 2024-08-09 23:35:30,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11000, loss[loss=0.1113, beats_loss=0.01635, ecapa_loss=0.0002491, whisper_loss=0.09241, over 23411.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01229, ecapa_loss=0.0003204, whisper_loss=0.1035, over 3928721.80 frames. ], batch size: 94, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:35:33,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.844e+01 3.291e+01 3.745e+01 5.513e+01, threshold=6.582e+01, percent-clipped=0.0 2024-08-09 23:35:39,613 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 23:36:16,345 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 23:36:22,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255280.0, ans=0.1 2024-08-09 23:36:41,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11050, loss[loss=0.09235, beats_loss=0.01472, ecapa_loss=0.0003194, whisper_loss=0.07443, over 22759.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01238, ecapa_loss=0.0003196, whisper_loss=0.1023, over 3915302.06 frames. ], batch size: 96, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:36:43,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=255480.0, ans=0.125 2024-08-09 23:36:48,827 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.306e-01 2024-08-09 23:36:54,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-08-09 23:37:05,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=255580.0, ans=0.0 2024-08-09 23:37:20,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=255680.0, ans=0.09899494936611666 2024-08-09 23:37:21,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=255780.0, ans=0.125 2024-08-09 23:37:34,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=255780.0, ans=0.125 2024-08-09 23:37:35,352 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 23:37:42,339 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-09 23:37:46,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=255880.0, ans=0.125 2024-08-09 23:37:50,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11100, loss[loss=0.117, beats_loss=0.01335, ecapa_loss=0.0003114, whisper_loss=0.1005, over 21966.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01245, ecapa_loss=0.0003186, whisper_loss=0.102, over 3930616.29 frames. ], batch size: 89, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:37:53,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+01 3.083e+01 3.527e+01 4.357e+01 6.576e+01, threshold=7.054e+01, percent-clipped=0.0 2024-08-09 23:38:16,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=256080.0, ans=0.0 2024-08-09 23:38:18,797 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 23:38:23,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=256180.0, ans=0.125 2024-08-09 23:38:25,568 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 23:38:39,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.70 vs. limit=10.0 2024-08-09 23:38:42,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=256280.0, ans=0.0 2024-08-09 23:38:43,229 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 23:38:59,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11150, loss[loss=0.1047, beats_loss=0.01269, ecapa_loss=0.0003809, whisper_loss=0.08824, over 15143.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01244, ecapa_loss=0.0003182, whisper_loss=0.1025, over 3941406.41 frames. ], batch size: 65, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:39:00,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=256480.0, ans=0.125 2024-08-09 23:39:03,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.23 vs. limit=22.5 2024-08-09 23:39:26,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=256680.0, ans=0.2 2024-08-09 23:39:50,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=256780.0, ans=0.125 2024-08-09 23:40:02,852 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 23:40:03,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-09 23:40:05,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-09 23:40:09,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11200, loss[loss=0.1135, beats_loss=0.01169, ecapa_loss=0.0002573, whisper_loss=0.09923, over 19896.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01242, ecapa_loss=0.0003188, whisper_loss=0.1016, over 3918991.08 frames. ], batch size: 77, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:40:11,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=256980.0, ans=0.125 2024-08-09 23:40:12,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.109e+01 3.535e+01 4.149e+01 6.453e+01, threshold=7.070e+01, percent-clipped=0.0 2024-08-09 23:40:12,704 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 23:40:20,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=256980.0, ans=0.125 2024-08-09 23:40:29,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257080.0, ans=0.1 2024-08-09 23:40:42,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=257180.0, ans=0.0 2024-08-09 23:40:48,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=257180.0, ans=0.04949747468305833 2024-08-09 23:40:48,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=257180.0, ans=0.2 2024-08-09 23:40:49,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=257180.0, ans=0.125 2024-08-09 23:41:00,156 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-09 23:41:06,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=15.0 2024-08-09 23:41:19,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11250, loss[loss=0.1047, beats_loss=0.01333, ecapa_loss=0.0003066, whisper_loss=0.08834, over 17907.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01246, ecapa_loss=0.0003193, whisper_loss=0.1011, over 3911236.03 frames. ], batch size: 73, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:41:20,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=257480.0, ans=0.125 2024-08-09 23:42:03,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2024-08-09 23:42:04,428 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-09 23:42:12,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=257780.0, ans=0.125 2024-08-09 23:42:20,485 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 23:42:28,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11300, loss[loss=0.1143, beats_loss=0.01308, ecapa_loss=0.0002947, whisper_loss=0.09825, over 18769.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0125, ecapa_loss=0.000317, whisper_loss=0.1009, over 3920480.27 frames. ], batch size: 75, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:42:31,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+01 3.110e+01 3.449e+01 4.025e+01 6.550e+01, threshold=6.899e+01, percent-clipped=0.0 2024-08-09 23:42:31,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=257980.0, ans=0.125 2024-08-09 23:42:31,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=257980.0, ans=0.025 2024-08-09 23:42:40,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=258080.0, ans=0.2 2024-08-09 23:43:06,370 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 23:43:17,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=258280.0, ans=0.125 2024-08-09 23:43:20,382 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 23:43:22,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=12.0 2024-08-09 23:43:22,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-09 23:43:26,783 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-09 23:43:36,202 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11350, loss[loss=0.1284, beats_loss=0.01033, ecapa_loss=0.0004017, whisper_loss=0.114, over 19208.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01245, ecapa_loss=0.0003179, whisper_loss=0.1017, over 3938548.05 frames. ], batch size: 79, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:43:40,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.37 vs. limit=22.5 2024-08-09 23:43:41,008 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 23:43:44,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.53 vs. limit=22.5 2024-08-09 23:43:46,123 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 23:43:46,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=258480.0, ans=0.125 2024-08-09 23:43:52,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=258580.0, ans=0.0 2024-08-09 23:44:02,952 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 23:44:03,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=258680.0, ans=0.04949747468305833 2024-08-09 23:44:07,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=258680.0, ans=0.125 2024-08-09 23:44:12,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=258680.0, ans=0.5 2024-08-09 23:44:19,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=258780.0, ans=0.125 2024-08-09 23:44:28,786 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 23 from Vox, 15 fro AS 2024-08-09 23:44:36,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=258880.0, ans=0.125 2024-08-09 23:44:36,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=258880.0, ans=0.0 2024-08-09 23:44:36,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2024-08-09 23:44:41,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=258880.0, ans=0.0 2024-08-09 23:44:44,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11400, loss[loss=0.1202, beats_loss=0.0127, ecapa_loss=0.0002753, whisper_loss=0.1048, over 23150.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01234, ecapa_loss=0.0003196, whisper_loss=0.102, over 3893167.78 frames. ], batch size: 91, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:44:47,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.889e+01 3.232e+01 3.833e+01 5.860e+01, threshold=6.464e+01, percent-clipped=0.0 2024-08-09 23:44:54,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=258980.0, ans=0.0 2024-08-09 23:45:06,549 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 23:45:08,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-08-09 23:45:16,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259180.0, ans=0.1 2024-08-09 23:45:20,392 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-09 23:45:28,636 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-09 23:45:45,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=259380.0, ans=0.0 2024-08-09 23:45:58,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11450, loss[loss=0.1239, beats_loss=0.01148, ecapa_loss=0.000379, whisper_loss=0.1086, over 21424.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01237, ecapa_loss=0.0003182, whisper_loss=0.1017, over 3866824.18 frames. ], batch size: 92, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:46:01,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=259480.0, ans=0.0 2024-08-09 23:46:04,455 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-09 23:46:07,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259480.0, ans=0.1 2024-08-09 23:46:15,576 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 23:46:39,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=259780.0, ans=0.0 2024-08-09 23:46:53,184 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 19 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-09 23:47:00,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=259880.0, ans=0.0 2024-08-09 23:47:01,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259880.0, ans=0.1 2024-08-09 23:47:08,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11500, loss[loss=0.1245, beats_loss=0.0111, ecapa_loss=0.0003313, whisper_loss=0.11, over 22408.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01238, ecapa_loss=0.0003171, whisper_loss=0.1015, over 3885144.09 frames. ], batch size: 92, lr: 2.32e-02, grad_scale: 131072.0 2024-08-09 23:47:11,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 3.028e+01 3.430e+01 4.047e+01 6.324e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 23:47:11,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=259980.0, ans=0.125 2024-08-09 23:47:39,257 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 23:47:39,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=260180.0, ans=0.0 2024-08-09 23:47:40,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=260180.0, ans=0.125 2024-08-09 23:48:01,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.74 vs. limit=22.5 2024-08-09 23:48:17,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11550, loss[loss=0.1046, beats_loss=0.01358, ecapa_loss=0.0003365, whisper_loss=0.0877, over 21634.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01248, ecapa_loss=0.0003179, whisper_loss=0.1007, over 3859117.02 frames. ], batch size: 91, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:48:22,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-09 23:48:23,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=260480.0, ans=0.125 2024-08-09 23:48:38,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=260580.0, ans=0.09899494936611666 2024-08-09 23:48:39,895 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 23:48:42,717 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-09 23:48:48,467 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 23:48:57,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=260680.0, ans=0.5 2024-08-09 23:48:57,865 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-09 23:48:59,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=260780.0, ans=0.125 2024-08-09 23:49:05,991 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 23:49:07,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=260780.0, ans=0.035 2024-08-09 23:49:16,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=260880.0, ans=0.0 2024-08-09 23:49:26,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11600, loss[loss=0.1001, beats_loss=0.01367, ecapa_loss=0.0003041, whisper_loss=0.08337, over 14650.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0124, ecapa_loss=0.0003178, whisper_loss=0.1012, over 3868224.22 frames. ], batch size: 56, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:49:29,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.873e+01 3.365e+01 3.781e+01 5.038e+01, threshold=6.731e+01, percent-clipped=0.0 2024-08-09 23:49:31,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=260980.0, ans=0.0 2024-08-09 23:49:32,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=260980.0, ans=0.125 2024-08-09 23:49:33,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=260980.0, ans=0.2 2024-08-09 23:49:41,718 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:49:51,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=261080.0, ans=0.1 2024-08-09 23:49:58,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=261180.0, ans=0.125 2024-08-09 23:50:12,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=261280.0, ans=0.015 2024-08-09 23:50:13,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=261280.0, ans=0.125 2024-08-09 23:50:13,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-08-09 23:50:19,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=261280.0, ans=0.125 2024-08-09 23:50:26,507 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 23:50:36,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=12.0 2024-08-09 23:50:37,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11650, loss[loss=0.1273, beats_loss=0.0116, ecapa_loss=0.0003359, whisper_loss=0.1123, over 22637.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01245, ecapa_loss=0.0003168, whisper_loss=0.101, over 3878870.26 frames. ], batch size: 90, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:51:03,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=261580.0, ans=0.2 2024-08-09 23:51:08,538 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 23:51:12,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2024-08-09 23:51:17,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=261680.0, ans=0.0 2024-08-09 23:51:20,203 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-09 23:51:21,624 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 11 from Vox, 44 fro AS 2024-08-09 23:51:21,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=261780.0, ans=0.0 2024-08-09 23:51:25,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=261780.0, ans=0.0 2024-08-09 23:51:26,928 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 23:51:33,750 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 23:51:36,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=261880.0, ans=0.125 2024-08-09 23:51:37,677 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-09 23:51:39,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-08-09 23:51:46,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11700, loss[loss=0.1218, beats_loss=0.009213, ecapa_loss=0.0003893, whisper_loss=0.1087, over 16544.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.0125, ecapa_loss=0.000317, whisper_loss=0.101, over 3877401.55 frames. ], batch size: 66, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:51:48,325 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-09 23:51:49,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 3.059e+01 3.535e+01 4.179e+01 1.066e+02, threshold=7.070e+01, percent-clipped=1.0 2024-08-09 23:51:57,965 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-09 23:52:14,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-09 23:52:14,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2024-08-09 23:52:24,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=262180.0, ans=0.125 2024-08-09 23:52:36,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-09 23:52:41,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=262380.0, ans=0.0 2024-08-09 23:52:54,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11750, loss[loss=0.09203, beats_loss=0.0159, ecapa_loss=0.0002275, whisper_loss=0.07386, over 17358.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0126, ecapa_loss=0.0003188, whisper_loss=0.1007, over 3888308.15 frames. ], batch size: 69, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:52:57,555 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 23:52:59,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2024-08-09 23:53:29,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-08-09 23:53:33,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=262680.0, ans=0.2 2024-08-09 23:53:54,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=262880.0, ans=0.125 2024-08-09 23:54:02,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11800, loss[loss=0.1301, beats_loss=0.01165, ecapa_loss=0.0002511, whisper_loss=0.116, over 20303.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.0125, ecapa_loss=0.0003188, whisper_loss=0.1019, over 3896153.11 frames. ], batch size: 75, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:54:05,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 3.014e+01 3.516e+01 4.289e+01 8.691e+01, threshold=7.033e+01, percent-clipped=2.0 2024-08-09 23:54:19,199 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 23:54:29,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=263180.0, ans=0.0 2024-08-09 23:54:37,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=263180.0, ans=0.125 2024-08-09 23:54:51,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263280.0, ans=0.1 2024-08-09 23:54:51,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=263280.0, ans=0.09899494936611666 2024-08-09 23:55:07,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=263380.0, ans=15.0 2024-08-09 23:55:09,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=263380.0, ans=0.125 2024-08-09 23:55:11,175 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11850, loss[loss=0.1188, beats_loss=0.01106, ecapa_loss=0.0003518, whisper_loss=0.1042, over 16426.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01253, ecapa_loss=0.0003154, whisper_loss=0.1018, over 3893137.48 frames. ], batch size: 66, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:55:40,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=263680.0, ans=0.2 2024-08-09 23:55:42,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=263680.0, ans=0.125 2024-08-09 23:55:48,855 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 23:56:12,228 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 23:56:12,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=263880.0, ans=0.125 2024-08-09 23:56:18,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11900, loss[loss=0.1058, beats_loss=0.01177, ecapa_loss=0.000232, whisper_loss=0.0917, over 16683.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01248, ecapa_loss=0.0003152, whisper_loss=0.1022, over 3911443.95 frames. ], batch size: 62, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:56:21,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.968e+01 3.550e+01 4.423e+01 6.843e+01, threshold=7.099e+01, percent-clipped=0.0 2024-08-09 23:56:24,191 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 23:56:39,336 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 22 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-09 23:56:45,907 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 23:56:58,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=264280.0, ans=0.125 2024-08-09 23:57:01,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264280.0, ans=0.1 2024-08-09 23:57:13,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=264380.0, ans=0.2 2024-08-09 23:57:26,827 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 11950, loss[loss=0.08675, beats_loss=0.01409, ecapa_loss=0.0003443, whisper_loss=0.06922, over 19699.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01243, ecapa_loss=0.0003142, whisper_loss=0.1012, over 3877031.60 frames. ], batch size: 84, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:57:27,136 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 23:57:30,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264480.0, ans=0.1 2024-08-09 23:57:34,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=264480.0, ans=0.07 2024-08-09 23:57:57,510 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 23:58:04,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=264680.0, ans=0.125 2024-08-09 23:58:17,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-09 23:58:35,890 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12000, loss[loss=0.1157, beats_loss=0.01485, ecapa_loss=0.0003358, whisper_loss=0.09754, over 22132.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01246, ecapa_loss=0.0003128, whisper_loss=0.1013, over 3871199.38 frames. ], batch size: 91, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:58:35,891 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-09 23:59:06,857 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7232, 4.1162, 4.1128, 4.6593], device='cuda:3') 2024-08-09 23:59:15,224 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on ASR_libri: loss=0.2807, beats_loss=0, ecapa_loss=0.0009345, whisper_loss=0.2713, over 922467.00 frames. 2024-08-09 23:59:32,474 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on SV_voxceleb1: loss=0.008336, beats_loss=0, ecapa_loss=0.0008336, whisper_loss=0, over 939242.00 frames. 2024-08-10 00:01:27,316 INFO [train_multi_KD3.py:1149] (3/4) Epoch 2, validation on AT_audioset: loss=0.02968, beats_loss=0.02968, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 00:01:27,320 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 00:01:29,808 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.941e+01 3.442e+01 3.928e+01 6.406e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 00:01:39,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=264980.0, ans=0.0 2024-08-10 00:01:40,147 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 28 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 00:01:42,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-08-10 00:01:59,731 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 19 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-10 00:02:14,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=265280.0, ans=0.2 2024-08-10 00:02:17,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=265280.0, ans=0.0 2024-08-10 00:02:34,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265380.0, ans=0.1 2024-08-10 00:02:37,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12050, loss[loss=0.1258, beats_loss=0.009836, ecapa_loss=0.0003233, whisper_loss=0.1128, over 17394.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01243, ecapa_loss=0.0003136, whisper_loss=0.1017, over 3851082.28 frames. ], batch size: 65, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:02:53,534 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 00:03:03,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-08-10 00:03:18,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=265780.0, ans=0.125 2024-08-10 00:03:27,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=265780.0, ans=0.2 2024-08-10 00:03:37,932 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 00:03:39,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265880.0, ans=0.1 2024-08-10 00:03:47,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12100, loss[loss=0.1325, beats_loss=0.01346, ecapa_loss=0.000349, whisper_loss=0.1155, over 21250.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01241, ecapa_loss=0.0003153, whisper_loss=0.1021, over 3881382.97 frames. ], batch size: 88, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:03:50,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.134e+01 3.753e+01 4.563e+01 7.245e+01, threshold=7.507e+01, percent-clipped=1.0 2024-08-10 00:04:01,971 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 00:04:23,092 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 00:04:27,325 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 00:04:50,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=266380.0, ans=0.0 2024-08-10 00:04:56,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-10 00:04:57,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12150, loss[loss=0.09969, beats_loss=0.01184, ecapa_loss=0.0002576, whisper_loss=0.08527, over 15041.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.0124, ecapa_loss=0.0003142, whisper_loss=0.1017, over 3879675.65 frames. ], batch size: 56, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:05:17,633 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-10 00:05:25,088 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 00:05:38,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=266680.0, ans=0.5 2024-08-10 00:05:38,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=266680.0, ans=0.125 2024-08-10 00:05:49,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=266780.0, ans=0.125 2024-08-10 00:05:51,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=266780.0, ans=0.2 2024-08-10 00:06:00,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=266880.0, ans=0.125 2024-08-10 00:06:03,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.30 vs. limit=15.0 2024-08-10 00:06:04,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=266880.0, ans=0.125 2024-08-10 00:06:08,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12200, loss[loss=0.08691, beats_loss=0.01712, ecapa_loss=0.0002774, whisper_loss=0.06702, over 21049.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01244, ecapa_loss=0.0003154, whisper_loss=0.1011, over 3892889.25 frames. ], batch size: 88, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:06:11,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.872e+01 3.325e+01 3.813e+01 6.794e+01, threshold=6.650e+01, percent-clipped=0.0 2024-08-10 00:06:22,759 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 00:06:27,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=267080.0, ans=0.2 2024-08-10 00:06:37,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-08-10 00:06:45,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=267180.0, ans=0.0 2024-08-10 00:07:05,864 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 00:07:11,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=267380.0, ans=0.125 2024-08-10 00:07:18,275 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12250, loss[loss=0.1163, beats_loss=0.01462, ecapa_loss=0.0002899, whisper_loss=0.09877, over 22735.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01252, ecapa_loss=0.0003163, whisper_loss=0.1008, over 3899097.95 frames. ], batch size: 92, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:07:18,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=267480.0, ans=0.2 2024-08-10 00:07:18,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=267480.0, ans=0.125 2024-08-10 00:07:31,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=267580.0, ans=0.2 2024-08-10 00:07:35,239 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 00:07:35,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=267580.0, ans=0.025 2024-08-10 00:07:51,879 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 00:08:14,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-10 00:08:20,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=267880.0, ans=0.125 2024-08-10 00:08:27,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12300, loss[loss=0.124, beats_loss=0.01155, ecapa_loss=0.0002825, whisper_loss=0.1096, over 16356.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01247, ecapa_loss=0.000319, whisper_loss=0.1008, over 3913252.62 frames. ], batch size: 61, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:08:30,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.986e+01 3.586e+01 4.164e+01 6.809e+01, threshold=7.172e+01, percent-clipped=1.0 2024-08-10 00:08:34,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=267980.0, ans=0.0 2024-08-10 00:08:42,906 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 16 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-10 00:08:59,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=268180.0, ans=0.0 2024-08-10 00:09:07,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-10 00:09:12,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=268280.0, ans=0.07 2024-08-10 00:09:13,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=268280.0, ans=0.125 2024-08-10 00:09:23,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=268380.0, ans=22.5 2024-08-10 00:09:27,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=268380.0, ans=0.1 2024-08-10 00:09:28,141 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 00:09:36,220 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12350, loss[loss=0.09331, beats_loss=0.01548, ecapa_loss=0.000348, whisper_loss=0.07434, over 14953.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01248, ecapa_loss=0.0003224, whisper_loss=0.1014, over 3929796.84 frames. ], batch size: 62, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:09:47,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=15.0 2024-08-10 00:09:48,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=268480.0, ans=0.2 2024-08-10 00:09:49,460 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 00:09:51,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.12 vs. limit=22.5 2024-08-10 00:09:59,607 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 00:10:03,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=268680.0, ans=0.125 2024-08-10 00:10:09,584 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 00:10:13,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268680.0, ans=0.1 2024-08-10 00:10:26,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.18 vs. limit=22.5 2024-08-10 00:10:37,283 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 00:10:40,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=268880.0, ans=0.0 2024-08-10 00:10:48,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12400, loss[loss=0.1053, beats_loss=0.01156, ecapa_loss=0.0004, whisper_loss=0.08974, over 20836.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01253, ecapa_loss=0.00032, whisper_loss=0.1009, over 3927380.56 frames. ], batch size: 89, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:10:48,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=268980.0, ans=0.1 2024-08-10 00:10:50,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=268980.0, ans=0.0 2024-08-10 00:10:50,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.997e+01 3.426e+01 4.143e+01 8.992e+01, threshold=6.852e+01, percent-clipped=1.0 2024-08-10 00:10:51,245 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 00:10:51,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=268980.0, ans=0.02 2024-08-10 00:11:03,625 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 00:11:20,028 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 00:11:26,860 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 00:11:28,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=269280.0, ans=0.0 2024-08-10 00:11:29,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.20 vs. limit=10.0 2024-08-10 00:11:34,294 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 00:11:48,154 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 00:11:58,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12450, loss[loss=0.1002, beats_loss=0.01419, ecapa_loss=0.0002853, whisper_loss=0.08312, over 23191.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01243, ecapa_loss=0.0003197, whisper_loss=0.1013, over 3941713.97 frames. ], batch size: 93, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:12:01,073 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 00:12:02,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269480.0, ans=0.1 2024-08-10 00:12:02,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=269480.0, ans=0.2 2024-08-10 00:12:06,644 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 00:12:07,975 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 00:12:13,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.911e-01 2024-08-10 00:12:26,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=269680.0, ans=0.2 2024-08-10 00:12:26,995 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:12:32,441 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 00:12:46,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=269780.0, ans=0.2 2024-08-10 00:12:51,450 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-10 00:12:56,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2024-08-10 00:13:08,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12500, loss[loss=0.1454, beats_loss=0.01028, ecapa_loss=0.0003386, whisper_loss=0.1317, over 21988.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01241, ecapa_loss=0.0003185, whisper_loss=0.1015, over 3919063.44 frames. ], batch size: 89, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:13:11,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.015e+01 3.443e+01 4.080e+01 3.263e+02, threshold=6.886e+01, percent-clipped=2.0 2024-08-10 00:13:14,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=269980.0, ans=0.0 2024-08-10 00:13:21,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=270080.0, ans=0.125 2024-08-10 00:13:29,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=270080.0, ans=0.2 2024-08-10 00:13:29,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-08-10 00:13:30,596 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-10 00:13:34,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=270180.0, ans=0.125 2024-08-10 00:13:46,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270180.0, ans=0.1 2024-08-10 00:14:17,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12550, loss[loss=0.1252, beats_loss=0.0146, ecapa_loss=0.0002503, whisper_loss=0.1081, over 20799.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01239, ecapa_loss=0.0003193, whisper_loss=0.1017, over 3923561.71 frames. ], batch size: 80, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:14:28,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=270480.0, ans=0.125 2024-08-10 00:14:33,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=270580.0, ans=0.0 2024-08-10 00:14:35,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=270580.0, ans=0.125 2024-08-10 00:14:48,240 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 00:14:59,472 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 00:15:02,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=270780.0, ans=0.2 2024-08-10 00:15:10,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=270780.0, ans=0.0 2024-08-10 00:15:13,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=270880.0, ans=0.0 2024-08-10 00:15:13,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=270880.0, ans=0.125 2024-08-10 00:15:22,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=270880.0, ans=0.125 2024-08-10 00:15:23,936 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 40 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 00:15:25,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-08-10 00:15:27,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12600, loss[loss=0.1282, beats_loss=0.01072, ecapa_loss=0.0003178, whisper_loss=0.1143, over 17859.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01238, ecapa_loss=0.0003182, whisper_loss=0.102, over 3930616.29 frames. ], batch size: 70, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:15:30,391 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 3.077e+01 3.630e+01 3.984e+01 7.187e+01, threshold=7.260e+01, percent-clipped=1.0 2024-08-10 00:15:49,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=271080.0, ans=0.2 2024-08-10 00:15:50,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-10 00:16:00,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=271180.0, ans=0.125 2024-08-10 00:16:12,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.81 vs. limit=22.5 2024-08-10 00:16:15,691 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 00:16:20,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=15.0 2024-08-10 00:16:23,959 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 00:16:28,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=12.0 2024-08-10 00:16:37,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12650, loss[loss=0.1244, beats_loss=0.01179, ecapa_loss=0.0002909, whisper_loss=0.1097, over 19332.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01235, ecapa_loss=0.0003193, whisper_loss=0.1018, over 3903096.10 frames. ], batch size: 76, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:17:02,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=6.0 2024-08-10 00:17:29,519 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 00:17:47,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12700, loss[loss=0.09241, beats_loss=0.01548, ecapa_loss=0.000341, whisper_loss=0.07353, over 18633.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01245, ecapa_loss=0.0003177, whisper_loss=0.1014, over 3914307.89 frames. ], batch size: 80, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:17:50,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 3.012e+01 3.366e+01 3.844e+01 6.101e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 00:17:50,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=271980.0, ans=0.125 2024-08-10 00:17:58,973 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 00:18:00,349 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 00:18:00,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=272080.0, ans=0.125 2024-08-10 00:18:17,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=272180.0, ans=0.5 2024-08-10 00:18:19,779 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 00:18:31,984 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-10 00:18:33,278 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 00:18:33,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=272280.0, ans=0.5 2024-08-10 00:18:34,819 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 00:18:47,557 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 00:18:56,168 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 00:18:57,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12750, loss[loss=0.1288, beats_loss=0.0116, ecapa_loss=0.0002977, whisper_loss=0.1142, over 23092.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01254, ecapa_loss=0.0003168, whisper_loss=0.1013, over 3922247.33 frames. ], batch size: 92, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:19:08,947 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 00:19:13,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272580.0, ans=0.1 2024-08-10 00:19:18,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=272580.0, ans=0.125 2024-08-10 00:19:22,027 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 35 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 00:19:22,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.18 vs. limit=15.0 2024-08-10 00:19:37,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=272680.0, ans=0.0 2024-08-10 00:19:49,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=272780.0, ans=0.125 2024-08-10 00:20:07,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12800, loss[loss=0.1236, beats_loss=0.01482, ecapa_loss=0.0002509, whisper_loss=0.1063, over 22392.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01255, ecapa_loss=0.000319, whisper_loss=0.1009, over 3932233.60 frames. ], batch size: 90, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:20:09,319 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 00:20:09,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=272980.0, ans=0.125 2024-08-10 00:20:10,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 2.990e+01 3.546e+01 4.142e+01 8.927e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-10 00:20:13,501 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 00:20:21,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-08-10 00:20:30,558 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 00:20:30,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=273080.0, ans=0.025 2024-08-10 00:20:31,808 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 00:20:37,433 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 00:20:44,756 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 00:20:45,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=273180.0, ans=0.2 2024-08-10 00:20:46,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=273180.0, ans=0.125 2024-08-10 00:21:10,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=273380.0, ans=0.0 2024-08-10 00:21:11,787 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-10 00:21:12,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=273380.0, ans=0.1 2024-08-10 00:21:13,211 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 00:21:18,436 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12850, loss[loss=0.1236, beats_loss=0.0111, ecapa_loss=0.0003097, whisper_loss=0.1094, over 21458.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01258, ecapa_loss=0.0003159, whisper_loss=0.1009, over 3934014.06 frames. ], batch size: 88, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:21:21,039 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-10 00:21:45,124 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-10 00:22:11,667 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-10 00:22:21,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=273880.0, ans=0.125 2024-08-10 00:22:23,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2024-08-10 00:22:24,549 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 00:22:26,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=273880.0, ans=0.04949747468305833 2024-08-10 00:22:28,299 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12900, loss[loss=0.1018, beats_loss=0.01356, ecapa_loss=0.0002719, whisper_loss=0.08552, over 20659.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01268, ecapa_loss=0.0003138, whisper_loss=0.09989, over 3904382.02 frames. ], batch size: 81, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:22:31,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.013e+01 3.364e+01 3.931e+01 6.029e+01, threshold=6.729e+01, percent-clipped=0.0 2024-08-10 00:22:34,414 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 00:22:48,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2024-08-10 00:22:57,883 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 00:23:13,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=274280.0, ans=0.0 2024-08-10 00:23:26,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=274380.0, ans=0.125 2024-08-10 00:23:30,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=22.5 2024-08-10 00:23:38,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-10 00:23:38,995 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 00:23:40,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 12950, loss[loss=0.111, beats_loss=0.01415, ecapa_loss=0.0003488, whisper_loss=0.0934, over 17728.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01264, ecapa_loss=0.000313, whisper_loss=0.1004, over 3920972.98 frames. ], batch size: 75, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:23:47,628 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 00:23:55,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=274580.0, ans=0.125 2024-08-10 00:24:00,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274580.0, ans=0.1 2024-08-10 00:24:03,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-10 00:24:10,353 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 00:24:13,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=274680.0, ans=0.0 2024-08-10 00:24:25,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-10 00:24:30,836 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 00:24:50,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13000, loss[loss=0.1088, beats_loss=0.01501, ecapa_loss=0.00031, whisper_loss=0.09069, over 20898.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01265, ecapa_loss=0.0003135, whisper_loss=0.1001, over 3918051.50 frames. ], batch size: 87, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:24:53,282 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.907e+01 3.154e+01 3.704e+01 5.779e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 00:24:53,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=274980.0, ans=0.125 2024-08-10 00:24:56,201 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-10 00:25:07,947 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 00:25:17,466 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-10 00:25:18,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=15.0 2024-08-10 00:25:53,022 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 00:26:01,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13050, loss[loss=0.1156, beats_loss=0.01221, ecapa_loss=0.0002589, whisper_loss=0.1008, over 19322.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01264, ecapa_loss=0.0003125, whisper_loss=0.09997, over 3898402.15 frames. ], batch size: 74, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:26:26,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=275580.0, ans=10.0 2024-08-10 00:26:30,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=275680.0, ans=0.0 2024-08-10 00:26:38,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=275680.0, ans=0.0 2024-08-10 00:26:42,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=275680.0, ans=0.125 2024-08-10 00:26:46,160 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 00:26:51,577 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 00:27:02,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2024-08-10 00:27:06,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=275880.0, ans=0.125 2024-08-10 00:27:07,088 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 00:27:11,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=275980.0, ans=0.125 2024-08-10 00:27:12,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13100, loss[loss=0.0994, beats_loss=0.01704, ecapa_loss=0.000251, whisper_loss=0.07985, over 16624.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01259, ecapa_loss=0.0003113, whisper_loss=0.09988, over 3882891.88 frames. ], batch size: 67, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:27:14,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.977e+01 3.328e+01 3.884e+01 7.929e+01, threshold=6.656e+01, percent-clipped=3.0 2024-08-10 00:27:22,437 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 00:27:29,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-10 00:27:35,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.12 vs. limit=15.0 2024-08-10 00:27:38,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=15.0 2024-08-10 00:27:49,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=276180.0, ans=0.125 2024-08-10 00:27:55,991 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 00:27:59,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=276280.0, ans=0.125 2024-08-10 00:28:02,195 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 00:28:23,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13150, loss[loss=0.1139, beats_loss=0.01259, ecapa_loss=0.0003805, whisper_loss=0.09747, over 21041.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01265, ecapa_loss=0.0003116, whisper_loss=0.09984, over 3877917.55 frames. ], batch size: 86, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:28:30,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-10 00:28:30,723 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 00:28:31,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=276480.0, ans=0.125 2024-08-10 00:28:37,531 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 00:28:47,515 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-10 00:28:51,688 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 00:28:57,362 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 00:28:58,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=12.0 2024-08-10 00:29:02,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=15.0 2024-08-10 00:29:04,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=276780.0, ans=0.07 2024-08-10 00:29:06,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=276780.0, ans=0.125 2024-08-10 00:29:07,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=276780.0, ans=0.0 2024-08-10 00:29:13,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=276780.0, ans=0.125 2024-08-10 00:29:18,213 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 36 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 00:29:24,890 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 00:29:32,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=276980.0, ans=0.2 2024-08-10 00:29:33,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13200, loss[loss=0.1283, beats_loss=0.008427, ecapa_loss=0.0003526, whisper_loss=0.1163, over 22407.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.0125, ecapa_loss=0.0003133, whisper_loss=0.1008, over 3881748.16 frames. ], batch size: 90, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:29:36,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 3.048e+01 3.557e+01 4.616e+01 6.724e+01, threshold=7.115e+01, percent-clipped=1.0 2024-08-10 00:29:51,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=277080.0, ans=15.0 2024-08-10 00:29:51,900 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 00:29:56,638 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 18 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-10 00:30:32,322 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 00:30:43,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13250, loss[loss=0.1098, beats_loss=0.01314, ecapa_loss=0.0003044, whisper_loss=0.09366, over 16579.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01247, ecapa_loss=0.000313, whisper_loss=0.101, over 3857876.38 frames. ], batch size: 64, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:30:46,290 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 00:30:50,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=277480.0, ans=0.05 2024-08-10 00:30:56,423 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:31:09,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=277580.0, ans=0.0 2024-08-10 00:31:25,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-10 00:31:28,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=277780.0, ans=0.0 2024-08-10 00:31:31,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=277780.0, ans=0.0 2024-08-10 00:31:44,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-10 00:31:54,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=277880.0, ans=0.0 2024-08-10 00:31:56,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13300, loss[loss=0.1224, beats_loss=0.009895, ecapa_loss=0.000318, whisper_loss=0.1093, over 16894.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01254, ecapa_loss=0.0003137, whisper_loss=0.1009, over 3875028.02 frames. ], batch size: 64, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:31:57,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=277980.0, ans=0.025 2024-08-10 00:31:58,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=277980.0, ans=0.125 2024-08-10 00:31:59,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.953e+01 3.236e+01 3.823e+01 6.068e+01, threshold=6.472e+01, percent-clipped=0.0 2024-08-10 00:32:14,595 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 00:32:19,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=278080.0, ans=0.2 2024-08-10 00:32:22,008 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 00:32:33,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=278180.0, ans=0.125 2024-08-10 00:32:42,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278280.0, ans=0.125 2024-08-10 00:32:48,872 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.629e-02 2024-08-10 00:32:53,404 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 00:32:55,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=278280.0, ans=0.125 2024-08-10 00:33:01,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-10 00:33:04,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-10 00:33:08,521 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 00:33:12,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=278380.0, ans=15.0 2024-08-10 00:33:14,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13350, loss[loss=0.1212, beats_loss=0.01308, ecapa_loss=0.0002597, whisper_loss=0.1056, over 15840.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01261, ecapa_loss=0.0003147, whisper_loss=0.1006, over 3857158.68 frames. ], batch size: 61, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:33:17,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=278480.0, ans=0.125 2024-08-10 00:33:36,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=278580.0, ans=0.2 2024-08-10 00:33:43,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2024-08-10 00:33:46,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=278680.0, ans=0.125 2024-08-10 00:33:51,156 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 00:34:09,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=278780.0, ans=0.07 2024-08-10 00:34:13,566 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 00:34:26,393 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 00:34:31,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13400, loss[loss=0.06663, beats_loss=0.01561, ecapa_loss=0.0003303, whisper_loss=0.04771, over 15037.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01258, ecapa_loss=0.0003135, whisper_loss=0.09946, over 3833992.55 frames. ], batch size: 65, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:34:33,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2024-08-10 00:34:34,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.868e+01 3.242e+01 3.595e+01 7.666e+01, threshold=6.483e+01, percent-clipped=2.0 2024-08-10 00:34:47,570 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 00:35:08,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=279180.0, ans=0.1 2024-08-10 00:35:14,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=279180.0, ans=0.2 2024-08-10 00:35:29,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=279280.0, ans=0.125 2024-08-10 00:35:35,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=279380.0, ans=0.0 2024-08-10 00:35:37,798 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-10 00:35:38,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=279380.0, ans=0.0 2024-08-10 00:35:48,260 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13450, loss[loss=0.1327, beats_loss=0.01233, ecapa_loss=0.0003154, whisper_loss=0.1172, over 23512.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01258, ecapa_loss=0.0003135, whisper_loss=0.09922, over 3838099.02 frames. ], batch size: 92, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:35:56,305 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 00:36:02,597 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 00:36:38,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=279780.0, ans=0.0 2024-08-10 00:36:42,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=279780.0, ans=0.0 2024-08-10 00:36:42,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.02 vs. limit=22.5 2024-08-10 00:36:44,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=279780.0, ans=0.125 2024-08-10 00:36:45,400 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 00:36:47,032 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 00:37:06,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13500, loss[loss=0.1175, beats_loss=0.01263, ecapa_loss=0.0003143, whisper_loss=0.1017, over 23244.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01253, ecapa_loss=0.0003141, whisper_loss=0.0999, over 3839589.10 frames. ], batch size: 92, lr: 2.24e-02, grad_scale: 262144.0 2024-08-10 00:37:13,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.053e+01 3.516e+01 4.040e+01 7.643e+01, threshold=7.031e+01, percent-clipped=3.0 2024-08-10 00:37:16,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=279980.0, ans=0.0 2024-08-10 00:37:26,645 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-10 00:37:27,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=280080.0, ans=0.0 2024-08-10 00:37:28,216 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 00:37:30,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280080.0, ans=0.125 2024-08-10 00:37:37,540 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 00:37:44,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=280180.0, ans=0.0 2024-08-10 00:37:54,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=280280.0, ans=0.125 2024-08-10 00:38:01,794 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 00:38:14,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=280380.0, ans=0.0 2024-08-10 00:38:15,214 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 00:38:24,599 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13550, loss[loss=0.1005, beats_loss=0.0163, ecapa_loss=0.0002352, whisper_loss=0.0818, over 23540.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.0125, ecapa_loss=0.0003145, whisper_loss=0.09989, over 3867078.43 frames. ], batch size: 92, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:38:33,194 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 00:38:34,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=280480.0, ans=0.025 2024-08-10 00:38:34,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=280480.0, ans=0.0 2024-08-10 00:38:36,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=280480.0, ans=0.125 2024-08-10 00:38:45,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280580.0, ans=0.1 2024-08-10 00:38:50,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2024-08-10 00:39:01,883 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 00:39:11,259 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 00:39:24,859 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-10 00:39:25,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=280880.0, ans=0.125 2024-08-10 00:39:26,783 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-10 00:39:28,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=280880.0, ans=0.0 2024-08-10 00:39:33,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280880.0, ans=0.1 2024-08-10 00:39:41,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13600, loss[loss=0.1194, beats_loss=0.01222, ecapa_loss=0.0003287, whisper_loss=0.1039, over 21488.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01242, ecapa_loss=0.000313, whisper_loss=0.1003, over 3868085.01 frames. ], batch size: 86, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:39:42,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280980.0, ans=0.1 2024-08-10 00:39:44,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.967e+01 3.461e+01 3.946e+01 7.975e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-10 00:40:05,964 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 00:40:10,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=281080.0, ans=0.09899494936611666 2024-08-10 00:40:15,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=281180.0, ans=0.0 2024-08-10 00:40:17,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=281180.0, ans=0.0 2024-08-10 00:40:18,660 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 00:40:22,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2024-08-10 00:40:49,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-10 00:40:51,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=281380.0, ans=0.125 2024-08-10 00:41:00,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13650, loss[loss=0.1155, beats_loss=0.0105, ecapa_loss=0.0004185, whisper_loss=0.1009, over 16002.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01245, ecapa_loss=0.0003137, whisper_loss=0.09952, over 3832803.13 frames. ], batch size: 67, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:41:08,429 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 00:41:09,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.74 vs. limit=10.0 2024-08-10 00:41:30,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=281580.0, ans=0.125 2024-08-10 00:41:36,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281680.0, ans=0.1 2024-08-10 00:41:45,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-08-10 00:41:51,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=281780.0, ans=0.125 2024-08-10 00:42:13,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=281880.0, ans=0.125 2024-08-10 00:42:22,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13700, loss[loss=0.1296, beats_loss=0.01181, ecapa_loss=0.0002767, whisper_loss=0.115, over 24522.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01236, ecapa_loss=0.0003132, whisper_loss=0.1008, over 3845714.07 frames. ], batch size: 93, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:42:25,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 2.951e+01 3.261e+01 3.919e+01 6.807e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 00:42:26,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=15.0 2024-08-10 00:42:42,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-10 00:42:47,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-10 00:43:04,852 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 00:43:09,050 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 00:43:18,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2024-08-10 00:43:24,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=282280.0, ans=0.0 2024-08-10 00:43:31,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=282380.0, ans=0.125 2024-08-10 00:43:38,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2024-08-10 00:43:44,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13750, loss[loss=0.1233, beats_loss=0.009286, ecapa_loss=0.0002845, whisper_loss=0.1112, over 17204.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.0124, ecapa_loss=0.0003129, whisper_loss=0.1003, over 3848351.57 frames. ], batch size: 63, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:44:00,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=282580.0, ans=0.0 2024-08-10 00:44:05,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=282580.0, ans=0.125 2024-08-10 00:44:21,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=282680.0, ans=0.125 2024-08-10 00:44:32,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=282780.0, ans=0.0 2024-08-10 00:44:35,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.05 vs. limit=10.0 2024-08-10 00:44:36,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.74 vs. limit=5.0 2024-08-10 00:44:43,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=282780.0, ans=0.0 2024-08-10 00:44:57,819 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 00:45:01,026 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 19 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 00:45:01,296 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:45:02,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13800, loss[loss=0.09594, beats_loss=0.01535, ecapa_loss=0.0003217, whisper_loss=0.07737, over 20580.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01246, ecapa_loss=0.0003113, whisper_loss=0.1004, over 3853769.51 frames. ], batch size: 85, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:45:06,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.944e+01 3.294e+01 3.829e+01 5.391e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-10 00:45:08,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=282980.0, ans=0.2 2024-08-10 00:45:13,485 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 00:45:16,043 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 00:45:24,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283080.0, ans=0.1 2024-08-10 00:45:36,851 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 00:45:36,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=283180.0, ans=0.1 2024-08-10 00:45:38,834 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 00:46:06,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=283280.0, ans=0.0 2024-08-10 00:46:08,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=283280.0, ans=0.125 2024-08-10 00:46:14,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=283380.0, ans=0.0 2024-08-10 00:46:25,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13850, loss[loss=0.1123, beats_loss=0.01192, ecapa_loss=0.0004208, whisper_loss=0.09616, over 17512.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01239, ecapa_loss=0.0003129, whisper_loss=0.1013, over 3847111.61 frames. ], batch size: 74, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:46:40,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283580.0, ans=0.1 2024-08-10 00:47:01,885 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 00:47:03,130 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 00:47:08,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=283680.0, ans=0.09899494936611666 2024-08-10 00:47:12,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=283780.0, ans=0.125 2024-08-10 00:47:14,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283780.0, ans=0.1 2024-08-10 00:47:20,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=283780.0, ans=0.05 2024-08-10 00:47:24,707 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.835e+00 2024-08-10 00:47:33,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=283880.0, ans=0.125 2024-08-10 00:47:33,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=283880.0, ans=0.125 2024-08-10 00:47:44,366 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-10 00:47:47,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13900, loss[loss=0.1061, beats_loss=0.01215, ecapa_loss=0.0003285, whisper_loss=0.09071, over 17078.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01236, ecapa_loss=0.0003116, whisper_loss=0.1021, over 3860157.33 frames. ], batch size: 67, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:47:48,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=283980.0, ans=0.0 2024-08-10 00:47:50,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.945e+01 3.348e+01 3.878e+01 5.863e+01, threshold=6.696e+01, percent-clipped=0.0 2024-08-10 00:48:07,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2024-08-10 00:48:11,656 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 00:48:27,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-08-10 00:48:48,935 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 00:49:02,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=284380.0, ans=0.125 2024-08-10 00:49:05,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=284380.0, ans=0.125 2024-08-10 00:49:09,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 13950, loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0004037, whisper_loss=0.08904, over 16297.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01241, ecapa_loss=0.0003106, whisper_loss=0.1017, over 3862700.64 frames. ], batch size: 68, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:49:20,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-10 00:49:37,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=284580.0, ans=0.125 2024-08-10 00:49:48,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=284680.0, ans=0.125 2024-08-10 00:49:59,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=284780.0, ans=0.2 2024-08-10 00:50:31,256 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 00:50:33,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14000, loss[loss=0.1202, beats_loss=0.01031, ecapa_loss=0.0002867, whisper_loss=0.107, over 22689.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01242, ecapa_loss=0.000309, whisper_loss=0.1016, over 3858895.34 frames. ], batch size: 90, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:50:35,934 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.957e+01 3.357e+01 3.952e+01 6.248e+01, threshold=6.715e+01, percent-clipped=0.0 2024-08-10 00:50:40,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.90 vs. limit=15.0 2024-08-10 00:50:43,146 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 00:51:01,450 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 00:51:16,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=285180.0, ans=0.2 2024-08-10 00:51:40,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-08-10 00:51:43,002 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 00:51:54,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14050, loss[loss=0.1305, beats_loss=0.01234, ecapa_loss=0.0003218, whisper_loss=0.1149, over 21213.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01231, ecapa_loss=0.0003092, whisper_loss=0.1021, over 3844455.45 frames. ], batch size: 85, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:52:28,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=285680.0, ans=0.125 2024-08-10 00:53:05,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=12.0 2024-08-10 00:53:10,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=285880.0, ans=0.0 2024-08-10 00:53:15,401 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14100, loss[loss=0.1206, beats_loss=0.01193, ecapa_loss=0.000315, whisper_loss=0.1055, over 16550.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01242, ecapa_loss=0.00031, whisper_loss=0.1019, over 3852599.82 frames. ], batch size: 65, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:53:18,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 2.998e+01 3.654e+01 4.043e+01 1.341e+02, threshold=7.307e+01, percent-clipped=1.0 2024-08-10 00:53:24,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=285980.0, ans=0.07 2024-08-10 00:53:30,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=286080.0, ans=0.125 2024-08-10 00:53:46,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=286080.0, ans=0.0 2024-08-10 00:53:49,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2024-08-10 00:54:17,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286280.0, ans=0.1 2024-08-10 00:54:30,193 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 00:54:35,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14150, loss[loss=0.1287, beats_loss=0.01178, ecapa_loss=0.0003677, whisper_loss=0.1133, over 21708.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.0125, ecapa_loss=0.0003125, whisper_loss=0.1005, over 3851079.74 frames. ], batch size: 88, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:54:36,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=286480.0, ans=0.125 2024-08-10 00:54:48,542 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 00:54:50,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=286480.0, ans=10.0 2024-08-10 00:54:57,393 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 00:54:59,199 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 00:55:05,636 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 00:55:24,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=286780.0, ans=0.0 2024-08-10 00:55:26,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=286780.0, ans=0.04949747468305833 2024-08-10 00:55:27,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-10 00:55:44,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2024-08-10 00:55:48,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286880.0, ans=0.125 2024-08-10 00:55:53,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14200, loss[loss=0.1167, beats_loss=0.01311, ecapa_loss=0.0003241, whisper_loss=0.1004, over 18887.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.0125, ecapa_loss=0.0003099, whisper_loss=0.1005, over 3859355.26 frames. ], batch size: 76, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:55:57,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2024-08-10 00:55:58,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 3.000e+01 3.388e+01 3.894e+01 5.742e+01, threshold=6.776e+01, percent-clipped=0.0 2024-08-10 00:56:34,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2024-08-10 00:56:40,380 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 00:56:47,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=287180.0, ans=0.125 2024-08-10 00:56:52,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=287180.0, ans=0.125 2024-08-10 00:57:06,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=287280.0, ans=0.0 2024-08-10 00:57:08,282 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 00:57:21,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.89 vs. limit=15.0 2024-08-10 00:57:27,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-10 00:57:32,752 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 00:57:38,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14250, loss[loss=0.1375, beats_loss=0.01162, ecapa_loss=0.0002671, whisper_loss=0.1233, over 19690.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01249, ecapa_loss=0.0003104, whisper_loss=0.1009, over 3880273.63 frames. ], batch size: 74, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:57:47,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=287480.0, ans=0.0 2024-08-10 00:57:49,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=287480.0, ans=0.1 2024-08-10 00:58:33,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=287780.0, ans=0.125 2024-08-10 00:58:54,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.37 vs. limit=10.0 2024-08-10 00:58:56,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=287880.0, ans=0.125 2024-08-10 00:59:14,155 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14300, loss[loss=0.1121, beats_loss=0.01145, ecapa_loss=0.0003532, whisper_loss=0.09711, over 18961.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.0125, ecapa_loss=0.0003099, whisper_loss=0.1006, over 3883211.85 frames. ], batch size: 77, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:59:15,284 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 00:59:19,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+01 3.147e+01 3.620e+01 4.133e+01 1.421e+02, threshold=7.240e+01, percent-clipped=1.0 2024-08-10 00:59:23,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-10 00:59:39,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=288080.0, ans=0.125 2024-08-10 00:59:46,227 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-10 01:00:08,477 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 01:00:11,737 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 01:00:25,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-10 01:00:55,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2024-08-10 01:01:02,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.15 vs. limit=22.5 2024-08-10 01:01:12,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14350, loss[loss=0.1235, beats_loss=0.01233, ecapa_loss=0.0003549, whisper_loss=0.1076, over 21968.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01242, ecapa_loss=0.0003118, whisper_loss=0.1009, over 3904001.22 frames. ], batch size: 91, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:01:26,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288480.0, ans=0.1 2024-08-10 01:01:38,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=288580.0, ans=0.95 2024-08-10 01:02:02,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-10 01:02:07,315 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 01:02:33,826 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 01:03:08,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14400, loss[loss=0.1172, beats_loss=0.01178, ecapa_loss=0.0003112, whisper_loss=0.1023, over 19039.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01251, ecapa_loss=0.0003107, whisper_loss=0.1009, over 3911590.54 frames. ], batch size: 73, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:03:13,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.997e+01 3.365e+01 3.798e+01 7.821e+01, threshold=6.729e+01, percent-clipped=1.0 2024-08-10 01:03:16,789 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-10 01:03:29,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=288980.0, ans=0.125 2024-08-10 01:03:51,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-08-10 01:04:03,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=289180.0, ans=0.0 2024-08-10 01:04:05,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=289180.0, ans=0.0 2024-08-10 01:04:13,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=289280.0, ans=0.125 2024-08-10 01:04:13,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=289280.0, ans=0.025 2024-08-10 01:04:15,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=289280.0, ans=0.5 2024-08-10 01:04:32,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=289380.0, ans=0.0 2024-08-10 01:04:45,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 2, batch 14450, loss[loss=0.1067, beats_loss=0.0125, ecapa_loss=0.0003221, whisper_loss=0.091, over 16148.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.0125, ecapa_loss=0.0003106, whisper_loss=0.1011, over 3904743.13 frames. ], batch size: 64, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:04:46,802 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 01:04:47,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=289480.0, ans=0.125 2024-08-10 01:04:50,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=289480.0, ans=0.5 2024-08-10 01:04:56,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=289480.0, ans=0.1 2024-08-10 01:05:01,241 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 01:05:02,905 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 01:05:10,837 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 01:05:15,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=289680.0, ans=0.125 2024-08-10 01:06:23,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 0, loss[loss=0.1206, beats_loss=0.01448, ecapa_loss=0.000335, whisper_loss=0.1028, over 21575.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01448, ecapa_loss=0.000335, whisper_loss=0.1028, over 21575.00 frames. ], batch size: 87, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:06:23,825 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 01:07:07,578 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on ASR_libri: loss=0.2782, beats_loss=0, ecapa_loss=0.0009143, whisper_loss=0.2691, over 922467.00 frames. 2024-08-10 01:07:23,532 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on SV_voxceleb1: loss=0.008083, beats_loss=0, ecapa_loss=0.0008083, whisper_loss=0, over 939242.00 frames. 2024-08-10 01:09:28,028 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on AT_audioset: loss=0.02889, beats_loss=0.02889, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 01:09:28,031 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 01:09:29,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=289880.0, ans=0.0 2024-08-10 01:09:32,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=289880.0, ans=10.0 2024-08-10 01:09:48,269 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-10 01:09:49,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=289880.0, ans=0.2 2024-08-10 01:10:02,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 3.015e+01 3.420e+01 3.932e+01 5.377e+01, threshold=6.841e+01, percent-clipped=0.0 2024-08-10 01:10:21,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=289980.0, ans=0.125 2024-08-10 01:10:22,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=289980.0, ans=0.0 2024-08-10 01:10:29,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=290080.0, ans=0.0 2024-08-10 01:10:36,950 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 01:10:58,976 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 01:11:05,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-08-10 01:11:28,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.44 vs. limit=22.5 2024-08-10 01:11:41,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=290380.0, ans=0.125 2024-08-10 01:11:42,109 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 50, loss[loss=0.1186, beats_loss=0.01114, ecapa_loss=0.0003411, whisper_loss=0.104, over 19292.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01276, ecapa_loss=0.0003132, whisper_loss=0.1001, over 894937.37 frames. ], batch size: 77, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:11:44,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=290380.0, ans=0.125 2024-08-10 01:11:57,512 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 01:11:59,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=290380.0, ans=0.0 2024-08-10 01:12:06,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=290480.0, ans=0.125 2024-08-10 01:12:27,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=290480.0, ans=0.0 2024-08-10 01:12:43,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290580.0, ans=0.1 2024-08-10 01:12:43,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290580.0, ans=0.1 2024-08-10 01:13:26,777 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 01:13:28,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=290780.0, ans=0.025 2024-08-10 01:13:32,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=15.0 2024-08-10 01:13:48,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 100, loss[loss=0.09623, beats_loss=0.01278, ecapa_loss=0.0002202, whisper_loss=0.08125, over 16491.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01249, ecapa_loss=0.0003101, whisper_loss=0.09892, over 1546840.29 frames. ], batch size: 63, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:13:53,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-10 01:13:53,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=15.0 2024-08-10 01:13:55,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=290880.0, ans=0.0 2024-08-10 01:14:14,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=290980.0, ans=15.0 2024-08-10 01:14:18,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.835e+01 4.447e+01 6.801e+01, threshold=7.671e+01, percent-clipped=0.0 2024-08-10 01:14:39,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=291080.0, ans=0.1 2024-08-10 01:15:04,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=291180.0, ans=0.125 2024-08-10 01:15:11,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-08-10 01:15:36,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=291280.0, ans=0.125 2024-08-10 01:15:36,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=291280.0, ans=0.0 2024-08-10 01:15:42,873 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 01:15:44,452 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 150, loss[loss=0.1039, beats_loss=0.01203, ecapa_loss=0.0003274, whisper_loss=0.0886, over 22043.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01239, ecapa_loss=0.0003042, whisper_loss=0.09939, over 2019684.38 frames. ], batch size: 89, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:15:46,113 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 01:15:58,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291380.0, ans=0.0 2024-08-10 01:15:59,484 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 01:16:09,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=291480.0, ans=0.125 2024-08-10 01:16:29,774 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 01:16:34,997 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 01:16:48,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=291680.0, ans=0.0 2024-08-10 01:17:11,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 200, loss[loss=0.1081, beats_loss=0.01225, ecapa_loss=0.0003327, whisper_loss=0.09252, over 21806.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01222, ecapa_loss=0.0003027, whisper_loss=0.09974, over 2407200.97 frames. ], batch size: 91, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:17:31,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.029e+01 3.361e+01 3.912e+01 9.673e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 01:17:44,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=292080.0, ans=0.0 2024-08-10 01:17:50,468 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 01:18:31,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 250, loss[loss=0.1035, beats_loss=0.01331, ecapa_loss=0.0003493, whisper_loss=0.08672, over 17684.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.0122, ecapa_loss=0.0003021, whisper_loss=0.09996, over 2715683.85 frames. ], batch size: 75, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:19:00,311 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-10 01:19:00,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=292580.0, ans=0.0 2024-08-10 01:19:17,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=292680.0, ans=0.125 2024-08-10 01:19:19,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=292680.0, ans=0.0 2024-08-10 01:19:19,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=12.0 2024-08-10 01:19:23,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=292680.0, ans=0.0 2024-08-10 01:19:40,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=292780.0, ans=0.125 2024-08-10 01:19:40,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=292780.0, ans=0.125 2024-08-10 01:19:44,512 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 01:19:47,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 300, loss[loss=0.1445, beats_loss=0.01095, ecapa_loss=0.0003005, whisper_loss=0.1306, over 20993.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01217, ecapa_loss=0.0003003, whisper_loss=0.09978, over 2960690.12 frames. ], batch size: 80, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:19:58,913 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 01:20:01,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=15.0 2024-08-10 01:20:06,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.157e+01 3.521e+01 4.168e+01 6.266e+01, threshold=7.043e+01, percent-clipped=0.0 2024-08-10 01:20:33,803 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 01:20:36,955 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 01:21:06,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 350, loss[loss=0.1, beats_loss=0.01475, ecapa_loss=0.000314, whisper_loss=0.08215, over 19307.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01223, ecapa_loss=0.0002969, whisper_loss=0.09902, over 3125907.13 frames. ], batch size: 82, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:21:10,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-08-10 01:21:35,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=293480.0, ans=0.2 2024-08-10 01:21:37,355 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 01:21:42,587 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 01:21:45,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=293580.0, ans=0.125 2024-08-10 01:21:54,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=293680.0, ans=0.125 2024-08-10 01:22:04,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=293680.0, ans=0.0 2024-08-10 01:22:05,468 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 01:22:06,912 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 8 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 01:22:21,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 400, loss[loss=0.1217, beats_loss=0.01361, ecapa_loss=0.0002752, whisper_loss=0.1053, over 22194.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.0123, ecapa_loss=0.0002969, whisper_loss=0.09836, over 3258376.08 frames. ], batch size: 89, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:22:32,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-10 01:22:38,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=293980.0, ans=0.2 2024-08-10 01:22:39,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.898e+01 3.177e+01 4.000e+01 8.293e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 01:22:46,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=293980.0, ans=0.125 2024-08-10 01:22:56,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=294080.0, ans=0.125 2024-08-10 01:22:58,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=294080.0, ans=0.125 2024-08-10 01:22:58,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=294080.0, ans=0.0 2024-08-10 01:23:07,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=294180.0, ans=0.125 2024-08-10 01:23:09,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=22.5 2024-08-10 01:23:29,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=294280.0, ans=0.125 2024-08-10 01:23:37,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 450, loss[loss=0.1404, beats_loss=0.01194, ecapa_loss=0.0003002, whisper_loss=0.1254, over 23567.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.0122, ecapa_loss=0.0002957, whisper_loss=0.0984, over 3382806.99 frames. ], batch size: 92, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:23:46,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=294380.0, ans=0.0 2024-08-10 01:23:53,569 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 01:23:55,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.72 vs. limit=10.0 2024-08-10 01:24:01,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=294480.0, ans=0.125 2024-08-10 01:24:22,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-08-10 01:24:28,145 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 01:24:47,460 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 01:24:52,122 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 500, loss[loss=0.1155, beats_loss=0.01153, ecapa_loss=0.0003087, whisper_loss=0.1009, over 17776.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01222, ecapa_loss=0.0002936, whisper_loss=0.09867, over 3463203.16 frames. ], batch size: 71, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:24:53,458 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-10 01:24:57,916 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 01:25:09,594 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.966e+01 3.370e+01 3.826e+01 6.580e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-10 01:25:20,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=295080.0, ans=0.0 2024-08-10 01:25:40,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=295180.0, ans=0.125 2024-08-10 01:25:44,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=295180.0, ans=0.0 2024-08-10 01:25:48,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=295180.0, ans=0.0 2024-08-10 01:25:54,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295280.0, ans=0.1 2024-08-10 01:26:00,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=295280.0, ans=10.0 2024-08-10 01:26:05,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 550, loss[loss=0.1242, beats_loss=0.01234, ecapa_loss=0.0002836, whisper_loss=0.109, over 23557.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01211, ecapa_loss=0.0002961, whisper_loss=0.09895, over 3528381.52 frames. ], batch size: 90, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:26:08,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295380.0, ans=0.1 2024-08-10 01:26:25,592 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 01:26:31,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=295480.0, ans=0.125 2024-08-10 01:26:33,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=295480.0, ans=0.0 2024-08-10 01:26:36,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=295580.0, ans=0.125 2024-08-10 01:27:06,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=295780.0, ans=0.125 2024-08-10 01:27:14,717 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 01:27:20,960 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 600, loss[loss=0.1359, beats_loss=0.009864, ecapa_loss=0.0002579, whisper_loss=0.1234, over 19818.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01222, ecapa_loss=0.000292, whisper_loss=0.09868, over 3599335.52 frames. ], batch size: 73, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:27:21,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=295880.0, ans=0.125 2024-08-10 01:27:34,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=295980.0, ans=0.0 2024-08-10 01:27:38,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.875e+01 3.342e+01 3.961e+01 6.306e+01, threshold=6.685e+01, percent-clipped=0.0 2024-08-10 01:28:01,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=296080.0, ans=0.125 2024-08-10 01:28:07,088 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 01:28:08,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=22.5 2024-08-10 01:28:19,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=296280.0, ans=0.0 2024-08-10 01:28:30,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=296280.0, ans=12.0 2024-08-10 01:28:36,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 650, loss[loss=0.1208, beats_loss=0.01266, ecapa_loss=0.0003064, whisper_loss=0.105, over 18567.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01225, ecapa_loss=0.0002904, whisper_loss=0.09837, over 3647177.48 frames. ], batch size: 74, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:28:37,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=296380.0, ans=0.125 2024-08-10 01:28:38,623 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 01:28:39,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=296380.0, ans=0.0 2024-08-10 01:28:59,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=296480.0, ans=0.125 2024-08-10 01:29:11,037 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 01:29:15,194 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 01:29:15,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296580.0, ans=0.1 2024-08-10 01:29:21,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=296680.0, ans=0.125 2024-08-10 01:29:26,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296680.0, ans=0.1 2024-08-10 01:29:26,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2024-08-10 01:29:48,947 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 700, loss[loss=0.09087, beats_loss=0.01237, ecapa_loss=0.0002669, whisper_loss=0.07583, over 13657.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01217, ecapa_loss=0.000291, whisper_loss=0.09921, over 3668422.81 frames. ], batch size: 53, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:29:49,148 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 01:29:50,342 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 01:29:58,148 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 01:30:00,077 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 01:30:02,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=296880.0, ans=0.125 2024-08-10 01:30:07,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.824e+01 3.267e+01 4.012e+01 5.256e+01, threshold=6.535e+01, percent-clipped=0.0 2024-08-10 01:30:07,697 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 01:30:21,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=297080.0, ans=0.0 2024-08-10 01:30:32,109 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 01:30:40,844 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 01:30:44,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.14 vs. limit=22.5 2024-08-10 01:30:45,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=297180.0, ans=0.125 2024-08-10 01:30:50,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=297280.0, ans=0.0 2024-08-10 01:30:59,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=297280.0, ans=0.1 2024-08-10 01:30:59,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297280.0, ans=0.1 2024-08-10 01:31:00,563 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 01:31:04,546 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.160e-01 2024-08-10 01:31:05,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 750, loss[loss=0.1135, beats_loss=0.01019, ecapa_loss=0.0003669, whisper_loss=0.09966, over 21378.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01225, ecapa_loss=0.000289, whisper_loss=0.0994, over 3733364.89 frames. ], batch size: 91, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:31:09,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=297380.0, ans=0.0 2024-08-10 01:31:17,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297380.0, ans=0.1 2024-08-10 01:31:56,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2024-08-10 01:32:01,075 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 15 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 01:32:18,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 800, loss[loss=0.1127, beats_loss=0.014, ecapa_loss=0.0003026, whisper_loss=0.09562, over 22281.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01229, ecapa_loss=0.0002872, whisper_loss=0.0987, over 3747008.06 frames. ], batch size: 93, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:32:26,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-10 01:32:27,917 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 01:32:29,165 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 01:32:32,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=297980.0, ans=0.0 2024-08-10 01:32:35,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.843e+01 3.241e+01 3.911e+01 6.650e+01, threshold=6.482e+01, percent-clipped=1.0 2024-08-10 01:33:03,066 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 01:33:06,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2024-08-10 01:33:08,873 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 01:33:10,607 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 01:33:14,800 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 01:33:18,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=298280.0, ans=0.2 2024-08-10 01:33:22,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=298280.0, ans=0.2 2024-08-10 01:33:26,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-10 01:33:29,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=298280.0, ans=0.125 2024-08-10 01:33:33,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 850, loss[loss=0.1168, beats_loss=0.01176, ecapa_loss=0.0002755, whisper_loss=0.1023, over 17726.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01235, ecapa_loss=0.0002857, whisper_loss=0.09784, over 3764279.51 frames. ], batch size: 67, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:33:49,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=298480.0, ans=0.0 2024-08-10 01:34:09,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=298580.0, ans=0.2 2024-08-10 01:34:11,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=12.0 2024-08-10 01:34:16,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=298580.0, ans=0.125 2024-08-10 01:34:19,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298680.0, ans=0.1 2024-08-10 01:34:26,710 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 01:34:48,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 900, loss[loss=0.09565, beats_loss=0.0149, ecapa_loss=0.0002688, whisper_loss=0.07806, over 20705.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01234, ecapa_loss=0.0002853, whisper_loss=0.09807, over 3773087.12 frames. ], batch size: 87, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:34:59,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=298880.0, ans=0.125 2024-08-10 01:35:06,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.811e+01 3.274e+01 3.784e+01 5.899e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-10 01:35:11,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=298980.0, ans=0.2 2024-08-10 01:35:25,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2024-08-10 01:35:51,652 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 01:36:03,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 950, loss[loss=0.1132, beats_loss=0.01321, ecapa_loss=0.000231, whisper_loss=0.09765, over 21207.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01232, ecapa_loss=0.0002835, whisper_loss=0.09818, over 3759501.52 frames. ], batch size: 80, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:36:13,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=299380.0, ans=0.0 2024-08-10 01:36:17,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=299480.0, ans=0.0 2024-08-10 01:36:26,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=299480.0, ans=0.015 2024-08-10 01:36:27,841 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 01:37:00,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=299680.0, ans=0.125 2024-08-10 01:37:01,252 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 01:37:02,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=299680.0, ans=0.09899494936611666 2024-08-10 01:37:18,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-08-10 01:37:18,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1000, loss[loss=0.1231, beats_loss=0.01231, ecapa_loss=0.0003083, whisper_loss=0.1077, over 22206.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01233, ecapa_loss=0.0002842, whisper_loss=0.09866, over 3779167.51 frames. ], batch size: 89, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:37:37,609 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+01 2.926e+01 3.322e+01 3.689e+01 5.712e+01, threshold=6.643e+01, percent-clipped=0.0 2024-08-10 01:37:43,542 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 01:38:00,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=300080.0, ans=0.0 2024-08-10 01:38:03,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-10 01:38:05,677 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 01:38:18,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=300280.0, ans=0.0 2024-08-10 01:38:31,420 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 01:38:34,458 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1050, loss[loss=0.1424, beats_loss=0.009143, ecapa_loss=0.0003098, whisper_loss=0.1301, over 18147.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01232, ecapa_loss=0.0002824, whisper_loss=0.09882, over 3761945.83 frames. ], batch size: 68, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:38:35,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=22.5 2024-08-10 01:38:37,176 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 01:38:51,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=300480.0, ans=0.0 2024-08-10 01:38:53,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-08-10 01:39:04,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=300580.0, ans=0.2 2024-08-10 01:39:22,472 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.731e-03 2024-08-10 01:39:24,645 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-10 01:39:50,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1100, loss[loss=0.1061, beats_loss=0.013, ecapa_loss=0.0002576, whisper_loss=0.0905, over 22716.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01233, ecapa_loss=0.0002833, whisper_loss=0.09926, over 3777366.76 frames. ], batch size: 91, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:39:58,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=300880.0, ans=0.1 2024-08-10 01:40:07,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=300980.0, ans=0.125 2024-08-10 01:40:08,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.873e+01 3.261e+01 3.724e+01 5.464e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 01:40:08,697 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 01:40:17,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2024-08-10 01:40:43,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=301180.0, ans=0.025 2024-08-10 01:41:02,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=301280.0, ans=0.125 2024-08-10 01:41:03,183 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 01:41:04,450 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1150, loss[loss=0.1117, beats_loss=0.01335, ecapa_loss=0.0002448, whisper_loss=0.09595, over 19900.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.0124, ecapa_loss=0.0002822, whisper_loss=0.09887, over 3780446.61 frames. ], batch size: 80, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:41:25,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=301480.0, ans=0.125 2024-08-10 01:41:29,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=301480.0, ans=0.0 2024-08-10 01:41:34,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=301580.0, ans=0.0 2024-08-10 01:41:53,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-08-10 01:42:02,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=301780.0, ans=0.125 2024-08-10 01:42:12,729 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 01:42:19,171 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1200, loss[loss=0.09719, beats_loss=0.01123, ecapa_loss=0.000264, whisper_loss=0.08332, over 19021.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.0124, ecapa_loss=0.0002807, whisper_loss=0.0984, over 3796837.53 frames. ], batch size: 74, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:42:24,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2024-08-10 01:42:28,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=301880.0, ans=0.2 2024-08-10 01:42:33,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=301980.0, ans=0.0 2024-08-10 01:42:33,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=301980.0, ans=0.1 2024-08-10 01:42:36,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.802e+01 3.225e+01 3.750e+01 6.302e+01, threshold=6.450e+01, percent-clipped=0.0 2024-08-10 01:42:40,195 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 01:42:53,398 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 01:43:02,297 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 01:43:05,856 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 01:43:18,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=302280.0, ans=0.0 2024-08-10 01:43:27,668 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 01:43:33,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1250, loss[loss=0.1139, beats_loss=0.01257, ecapa_loss=0.0002256, whisper_loss=0.09903, over 20071.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01241, ecapa_loss=0.0002787, whisper_loss=0.09902, over 3829345.31 frames. ], batch size: 77, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:43:36,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2024-08-10 01:43:44,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=302380.0, ans=0.2 2024-08-10 01:43:48,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=302480.0, ans=0.125 2024-08-10 01:43:53,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=302480.0, ans=0.125 2024-08-10 01:44:06,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=302580.0, ans=0.025 2024-08-10 01:44:11,095 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 01:44:16,427 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 01:44:25,722 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 01:44:27,553 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 01:44:36,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=302780.0, ans=0.125 2024-08-10 01:44:48,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1300, loss[loss=0.09676, beats_loss=0.01469, ecapa_loss=0.0002029, whisper_loss=0.08004, over 21634.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01237, ecapa_loss=0.000279, whisper_loss=0.09919, over 3848711.91 frames. ], batch size: 84, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:45:08,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.884e+01 3.264e+01 3.595e+01 5.329e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 01:45:11,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=302980.0, ans=0.125 2024-08-10 01:45:11,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=302980.0, ans=0.0 2024-08-10 01:45:22,932 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 01:45:31,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=303080.0, ans=0.125 2024-08-10 01:45:32,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.28 vs. limit=10.0 2024-08-10 01:45:40,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=303180.0, ans=0.5 2024-08-10 01:45:53,745 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 01:45:55,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=303280.0, ans=0.125 2024-08-10 01:45:59,452 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 28 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-10 01:45:59,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=303280.0, ans=0.125 2024-08-10 01:46:02,214 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-10 01:46:02,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303280.0, ans=0.1 2024-08-10 01:46:10,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1350, loss[loss=0.12, beats_loss=0.0138, ecapa_loss=0.0002949, whisper_loss=0.1032, over 17524.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01236, ecapa_loss=0.0002805, whisper_loss=0.09893, over 3859240.80 frames. ], batch size: 70, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:46:13,951 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 01:46:24,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=303480.0, ans=0.05 2024-08-10 01:46:52,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=303580.0, ans=0.0 2024-08-10 01:46:57,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=303680.0, ans=0.125 2024-08-10 01:47:03,374 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 01:47:13,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=303780.0, ans=0.125 2024-08-10 01:47:15,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=303780.0, ans=0.125 2024-08-10 01:47:20,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=303780.0, ans=0.125 2024-08-10 01:47:22,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=303780.0, ans=0.2 2024-08-10 01:47:26,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1400, loss[loss=0.1302, beats_loss=0.01118, ecapa_loss=0.0002627, whisper_loss=0.1164, over 20187.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01242, ecapa_loss=0.000279, whisper_loss=0.09824, over 3836825.76 frames. ], batch size: 78, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:47:40,650 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-10 01:47:44,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.877e+01 3.100e+01 3.641e+01 7.400e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-10 01:47:49,467 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 01:47:55,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=304080.0, ans=0.07 2024-08-10 01:47:57,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=304080.0, ans=0.2 2024-08-10 01:48:07,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=304080.0, ans=0.125 2024-08-10 01:48:07,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-08-10 01:48:10,531 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.321e+00 2024-08-10 01:48:11,474 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 01:48:37,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=304280.0, ans=0.125 2024-08-10 01:48:38,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.88 vs. limit=10.0 2024-08-10 01:49:09,872 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 01:49:10,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1450, loss[loss=0.08877, beats_loss=0.01456, ecapa_loss=0.0001937, whisper_loss=0.07227, over 16474.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01241, ecapa_loss=0.0002776, whisper_loss=0.09795, over 3773926.94 frames. ], batch size: 63, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:49:19,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-08-10 01:49:29,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=304480.0, ans=0.125 2024-08-10 01:49:32,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2024-08-10 01:49:44,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=304580.0, ans=0.0 2024-08-10 01:49:45,866 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 01:49:50,785 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 01:49:53,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=304580.0, ans=0.0 2024-08-10 01:49:54,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=304580.0, ans=0.2 2024-08-10 01:49:56,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=304580.0, ans=0.125 2024-08-10 01:50:26,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=304780.0, ans=0.0 2024-08-10 01:50:28,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=304780.0, ans=0.125 2024-08-10 01:50:30,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1500, loss[loss=0.1341, beats_loss=0.01188, ecapa_loss=0.000316, whisper_loss=0.1191, over 21542.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01247, ecapa_loss=0.0002777, whisper_loss=0.09761, over 3806698.91 frames. ], batch size: 88, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:50:36,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2024-08-10 01:50:37,721 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 01:50:38,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-10 01:50:49,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=304980.0, ans=0.0 2024-08-10 01:50:49,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.72 vs. limit=22.5 2024-08-10 01:50:49,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.825e+01 3.192e+01 3.755e+01 6.662e+01, threshold=6.384e+01, percent-clipped=1.0 2024-08-10 01:51:10,611 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 01:51:12,020 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 01:51:12,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=305080.0, ans=0.0 2024-08-10 01:51:38,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=305280.0, ans=0.125 2024-08-10 01:51:48,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1550, loss[loss=0.1057, beats_loss=0.01513, ecapa_loss=0.000282, whisper_loss=0.08777, over 18844.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01235, ecapa_loss=0.0002791, whisper_loss=0.09856, over 3791289.91 frames. ], batch size: 77, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:51:52,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=305380.0, ans=0.0 2024-08-10 01:52:01,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=305380.0, ans=0.125 2024-08-10 01:52:05,780 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 01:52:07,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=305480.0, ans=0.125 2024-08-10 01:52:10,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=305480.0, ans=0.05 2024-08-10 01:52:15,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=305480.0, ans=0.2 2024-08-10 01:52:17,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-08-10 01:52:42,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=305680.0, ans=0.125 2024-08-10 01:52:43,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=305680.0, ans=0.2 2024-08-10 01:52:57,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=305780.0, ans=0.125 2024-08-10 01:53:07,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1600, loss[loss=0.1177, beats_loss=0.01174, ecapa_loss=0.0002341, whisper_loss=0.1037, over 24180.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01234, ecapa_loss=0.0002798, whisper_loss=0.09899, over 3803131.07 frames. ], batch size: 90, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:53:15,167 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 01:53:26,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=305980.0, ans=0.0 2024-08-10 01:53:27,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.961e+01 3.443e+01 4.067e+01 6.226e+01, threshold=6.887e+01, percent-clipped=0.0 2024-08-10 01:53:41,328 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 01:53:42,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.17 vs. limit=6.0 2024-08-10 01:53:45,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=306080.0, ans=0.125 2024-08-10 01:53:46,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=306080.0, ans=0.125 2024-08-10 01:54:01,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.97 vs. limit=15.0 2024-08-10 01:54:03,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.12 vs. limit=6.0 2024-08-10 01:54:21,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=306280.0, ans=0.2 2024-08-10 01:54:21,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=306280.0, ans=0.0 2024-08-10 01:54:26,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2024-08-10 01:54:26,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1650, loss[loss=0.1224, beats_loss=0.01411, ecapa_loss=0.0002363, whisper_loss=0.1059, over 22070.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01228, ecapa_loss=0.0002797, whisper_loss=0.0992, over 3819321.81 frames. ], batch size: 87, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:54:28,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=15.0 2024-08-10 01:54:45,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=306480.0, ans=0.0 2024-08-10 01:54:48,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=306480.0, ans=0.125 2024-08-10 01:54:49,953 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 01:54:51,294 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 01:55:11,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=306580.0, ans=0.0 2024-08-10 01:55:37,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=306780.0, ans=0.2 2024-08-10 01:55:43,818 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1700, loss[loss=0.09019, beats_loss=0.01287, ecapa_loss=0.0002822, whisper_loss=0.0745, over 13645.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01234, ecapa_loss=0.0002824, whisper_loss=0.1001, over 3835759.77 frames. ], batch size: 54, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:55:48,832 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 01:55:55,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=306880.0, ans=0.2 2024-08-10 01:55:57,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2024-08-10 01:56:01,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.006e+01 3.281e+01 3.850e+01 2.955e+02, threshold=6.563e+01, percent-clipped=2.0 2024-08-10 01:56:07,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=306980.0, ans=0.0 2024-08-10 01:56:12,819 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 01:56:28,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307180.0, ans=0.1 2024-08-10 01:56:32,937 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 01:56:33,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.02 vs. limit=10.0 2024-08-10 01:56:37,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=307180.0, ans=0.2 2024-08-10 01:56:57,905 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1750, loss[loss=0.1052, beats_loss=0.01185, ecapa_loss=0.0002788, whisper_loss=0.09051, over 18560.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01229, ecapa_loss=0.0002819, whisper_loss=0.09966, over 3825214.48 frames. ], batch size: 73, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:57:15,650 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 01:57:21,480 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 01:57:33,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=307580.0, ans=0.09899494936611666 2024-08-10 01:57:35,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=307580.0, ans=0.125 2024-08-10 01:57:37,757 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 27 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-10 01:57:38,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2024-08-10 01:57:53,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-10 01:57:57,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=307780.0, ans=0.125 2024-08-10 01:58:06,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=307780.0, ans=0.125 2024-08-10 01:58:09,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1800, loss[loss=0.1322, beats_loss=0.01106, ecapa_loss=0.0002281, whisper_loss=0.1189, over 15670.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01219, ecapa_loss=0.000281, whisper_loss=0.1004, over 3830865.72 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:58:26,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.751e+01 3.157e+01 3.582e+01 5.631e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 01:58:34,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307980.0, ans=0.1 2024-08-10 01:58:35,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307980.0, ans=0.1 2024-08-10 01:58:56,262 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 01:58:57,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=308180.0, ans=0.2 2024-08-10 01:59:13,704 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 01:59:20,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1850, loss[loss=0.1042, beats_loss=0.01297, ecapa_loss=0.000274, whisper_loss=0.08845, over 22834.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01223, ecapa_loss=0.0002809, whisper_loss=0.1002, over 3854716.89 frames. ], batch size: 89, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:59:20,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=308380.0, ans=0.2 2024-08-10 01:59:35,852 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 01:59:49,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=308580.0, ans=0.125 2024-08-10 01:59:52,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2024-08-10 01:59:57,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=308580.0, ans=0.2 2024-08-10 02:00:08,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308680.0, ans=0.1 2024-08-10 02:00:09,608 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 02:00:23,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=308780.0, ans=0.0 2024-08-10 02:00:30,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1900, loss[loss=0.1014, beats_loss=0.0149, ecapa_loss=0.0003199, whisper_loss=0.08325, over 21405.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01219, ecapa_loss=0.0002872, whisper_loss=0.09984, over 3829075.70 frames. ], batch size: 92, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:00:45,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308980.0, ans=0.1 2024-08-10 02:00:47,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.899e+01 3.416e+01 4.271e+01 7.702e+01, threshold=6.832e+01, percent-clipped=2.0 2024-08-10 02:00:51,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=308980.0, ans=0.2 2024-08-10 02:01:01,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=309080.0, ans=0.07 2024-08-10 02:01:05,400 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 02:01:06,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=309080.0, ans=0.0 2024-08-10 02:01:07,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=309080.0, ans=0.0 2024-08-10 02:01:12,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=309180.0, ans=0.125 2024-08-10 02:01:29,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=309280.0, ans=0.0 2024-08-10 02:01:37,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.22 vs. limit=15.0 2024-08-10 02:01:39,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 1950, loss[loss=0.1177, beats_loss=0.01363, ecapa_loss=0.000295, whisper_loss=0.1011, over 18814.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01225, ecapa_loss=0.0002935, whisper_loss=0.099, over 3820849.70 frames. ], batch size: 73, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:01:45,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=309380.0, ans=0.09899494936611666 2024-08-10 02:01:55,245 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-10 02:01:56,437 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 02:02:02,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=309480.0, ans=0.0 2024-08-10 02:02:19,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=309580.0, ans=0.2 2024-08-10 02:02:35,556 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 02:02:40,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=309780.0, ans=0.0 2024-08-10 02:02:51,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2000, loss[loss=0.1229, beats_loss=0.01267, ecapa_loss=0.0002917, whisper_loss=0.1073, over 19289.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01221, ecapa_loss=0.0002957, whisper_loss=0.1, over 3817640.28 frames. ], batch size: 75, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:02:58,609 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 02:03:03,674 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-10 02:03:08,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=309980.0, ans=0.2 2024-08-10 02:03:09,442 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.983e+01 3.552e+01 3.984e+01 6.262e+01, threshold=7.103e+01, percent-clipped=0.0 2024-08-10 02:03:17,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-08-10 02:03:21,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=310080.0, ans=0.1 2024-08-10 02:03:38,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310180.0, ans=0.1 2024-08-10 02:03:38,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=310180.0, ans=0.0 2024-08-10 02:03:53,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2024-08-10 02:04:03,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2050, loss[loss=0.1211, beats_loss=0.01234, ecapa_loss=0.0002904, whisper_loss=0.1058, over 15607.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01221, ecapa_loss=0.0002981, whisper_loss=0.09948, over 3834119.51 frames. ], batch size: 59, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:04:13,636 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 02:04:22,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=310480.0, ans=0.125 2024-08-10 02:04:36,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=310580.0, ans=0.125 2024-08-10 02:04:46,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=310680.0, ans=0.125 2024-08-10 02:05:04,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=310780.0, ans=0.125 2024-08-10 02:05:07,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.22 vs. limit=10.0 2024-08-10 02:05:13,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2100, loss[loss=0.1072, beats_loss=0.01289, ecapa_loss=0.0003683, whisper_loss=0.09058, over 20532.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01236, ecapa_loss=0.0002967, whisper_loss=0.09883, over 3816119.01 frames. ], batch size: 88, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:05:24,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=310880.0, ans=0.0 2024-08-10 02:05:25,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-08-10 02:05:29,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.901e+01 3.264e+01 3.705e+01 5.595e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 02:05:31,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=310980.0, ans=0.125 2024-08-10 02:05:45,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-10 02:05:47,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=311080.0, ans=0.1 2024-08-10 02:06:05,241 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 02:06:20,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=311280.0, ans=0.125 2024-08-10 02:06:23,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2150, loss[loss=0.1243, beats_loss=0.009562, ecapa_loss=0.0003575, whisper_loss=0.1112, over 15102.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01239, ecapa_loss=0.0002983, whisper_loss=0.09868, over 3814503.43 frames. ], batch size: 58, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:06:28,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=311380.0, ans=0.125 2024-08-10 02:06:36,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2024-08-10 02:06:37,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=311480.0, ans=0.0 2024-08-10 02:06:42,888 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 18 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 02:07:00,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=12.0 2024-08-10 02:07:12,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=311680.0, ans=0.125 2024-08-10 02:07:16,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2024-08-10 02:07:24,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311780.0, ans=0.1 2024-08-10 02:07:34,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=311780.0, ans=0.125 2024-08-10 02:07:37,006 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 02:07:38,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2200, loss[loss=0.1377, beats_loss=0.01152, ecapa_loss=0.0003308, whisper_loss=0.1228, over 22572.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01235, ecapa_loss=0.0002998, whisper_loss=0.09921, over 3844076.09 frames. ], batch size: 90, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:07:50,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=311880.0, ans=15.0 2024-08-10 02:07:55,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.913e+01 3.407e+01 3.904e+01 7.612e+01, threshold=6.814e+01, percent-clipped=1.0 2024-08-10 02:08:00,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=311980.0, ans=0.1 2024-08-10 02:08:10,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=312080.0, ans=0.125 2024-08-10 02:08:20,093 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 02:08:30,512 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-10 02:08:34,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=312180.0, ans=0.125 2024-08-10 02:08:37,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=312280.0, ans=0.125 2024-08-10 02:08:50,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2250, loss[loss=0.1557, beats_loss=0.01026, ecapa_loss=0.0003625, whisper_loss=0.1418, over 22879.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01241, ecapa_loss=0.0003007, whisper_loss=0.09948, over 3867543.27 frames. ], batch size: 89, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:08:52,379 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 02:08:55,188 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 02:09:19,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=312580.0, ans=0.125 2024-08-10 02:09:33,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=312680.0, ans=0.0 2024-08-10 02:09:53,674 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 02:10:03,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2300, loss[loss=0.1032, beats_loss=0.01297, ecapa_loss=0.00025, whisper_loss=0.08772, over 15522.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.0124, ecapa_loss=0.0002993, whisper_loss=0.09969, over 3881412.51 frames. ], batch size: 59, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:10:13,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=312880.0, ans=0.02 2024-08-10 02:10:15,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=312880.0, ans=0.125 2024-08-10 02:10:20,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=312980.0, ans=0.0 2024-08-10 02:10:21,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 3.060e+01 3.416e+01 3.893e+01 7.548e+01, threshold=6.833e+01, percent-clipped=2.0 2024-08-10 02:10:21,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=312980.0, ans=0.0 2024-08-10 02:10:23,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-08-10 02:10:24,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312980.0, ans=0.1 2024-08-10 02:10:45,119 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 02:11:01,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313280.0, ans=0.1 2024-08-10 02:11:02,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=313280.0, ans=0.125 2024-08-10 02:11:02,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=313280.0, ans=0.2 2024-08-10 02:11:07,704 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 02:11:12,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=313280.0, ans=0.0 2024-08-10 02:11:14,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2350, loss[loss=0.1024, beats_loss=0.009969, ecapa_loss=0.0003734, whisper_loss=0.08871, over 13349.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01237, ecapa_loss=0.0002995, whisper_loss=0.1001, over 3884049.73 frames. ], batch size: 54, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:11:23,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-10 02:11:36,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-08-10 02:11:55,376 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 02:12:04,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-08-10 02:12:23,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=313780.0, ans=0.125 2024-08-10 02:12:28,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2400, loss[loss=0.1041, beats_loss=0.01269, ecapa_loss=0.0002735, whisper_loss=0.08864, over 20046.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01223, ecapa_loss=0.0003017, whisper_loss=0.101, over 3924823.89 frames. ], batch size: 78, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:12:28,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=313880.0, ans=0.0 2024-08-10 02:12:32,783 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 02:12:38,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=313880.0, ans=0.125 2024-08-10 02:12:40,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=313880.0, ans=0.125 2024-08-10 02:12:42,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=313980.0, ans=0.04949747468305833 2024-08-10 02:12:44,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.008e+01 3.355e+01 4.317e+01 6.888e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 02:13:08,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=314080.0, ans=0.2 2024-08-10 02:13:23,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314180.0, ans=0.1 2024-08-10 02:13:28,725 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 02:13:33,540 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 02:13:40,406 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2450, loss[loss=0.1357, beats_loss=0.009356, ecapa_loss=0.0003507, whisper_loss=0.1228, over 21710.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01216, ecapa_loss=0.0003022, whisper_loss=0.1007, over 3915738.07 frames. ], batch size: 89, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:13:44,054 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 02:14:01,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=314480.0, ans=0.125 2024-08-10 02:14:24,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=314680.0, ans=22.5 2024-08-10 02:14:36,038 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 02:14:54,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2500, loss[loss=0.1633, beats_loss=0.009, ecapa_loss=0.0002888, whisper_loss=0.1514, over 17891.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01221, ecapa_loss=0.0003009, whisper_loss=0.1009, over 3897419.92 frames. ], batch size: 67, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:14:54,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=314880.0, ans=0.125 2024-08-10 02:14:57,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=12.0 2024-08-10 02:15:03,949 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 02:15:05,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2024-08-10 02:15:12,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.053e+01 3.458e+01 4.005e+01 5.985e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 02:15:24,024 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-10 02:15:27,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=315080.0, ans=0.0 2024-08-10 02:15:41,792 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 02:15:46,195 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 31 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-10 02:15:58,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315280.0, ans=0.125 2024-08-10 02:16:00,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-08-10 02:16:07,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2550, loss[loss=0.108, beats_loss=0.0144, ecapa_loss=0.0002623, whisper_loss=0.091, over 22517.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01211, ecapa_loss=0.000301, whisper_loss=0.1015, over 3902175.90 frames. ], batch size: 92, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:16:21,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=315480.0, ans=0.0 2024-08-10 02:16:24,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=315480.0, ans=0.0 2024-08-10 02:16:27,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-10 02:16:31,137 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-10 02:16:42,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=315580.0, ans=15.0 2024-08-10 02:16:45,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315580.0, ans=0.125 2024-08-10 02:16:47,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-08-10 02:17:08,953 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 02:17:11,714 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 02:17:11,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=315780.0, ans=0.125 2024-08-10 02:17:20,804 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2600, loss[loss=0.101, beats_loss=0.01235, ecapa_loss=0.0003611, whisper_loss=0.08501, over 17893.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01209, ecapa_loss=0.0003011, whisper_loss=0.1017, over 3882040.60 frames. ], batch size: 79, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:17:38,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.748e+01 3.170e+01 3.706e+01 6.461e+01, threshold=6.341e+01, percent-clipped=0.0 2024-08-10 02:17:41,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=315980.0, ans=0.125 2024-08-10 02:17:43,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2024-08-10 02:17:53,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=316080.0, ans=0.0 2024-08-10 02:18:04,704 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.728e+00 2024-08-10 02:18:10,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.80 vs. limit=15.0 2024-08-10 02:18:14,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=316180.0, ans=10.0 2024-08-10 02:18:38,120 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2650, loss[loss=0.09639, beats_loss=0.01403, ecapa_loss=0.0002748, whisper_loss=0.07961, over 20987.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01206, ecapa_loss=0.0002995, whisper_loss=0.1016, over 3881308.58 frames. ], batch size: 86, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:18:38,457 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 02:19:07,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2024-08-10 02:19:41,974 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 02:19:54,363 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2700, loss[loss=0.08574, beats_loss=0.01442, ecapa_loss=0.000286, whisper_loss=0.06846, over 15034.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01224, ecapa_loss=0.0002968, whisper_loss=0.1004, over 3855683.41 frames. ], batch size: 64, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:19:55,332 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 02:20:03,429 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 02:20:04,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-10 02:20:11,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.947e+01 3.317e+01 3.968e+01 5.790e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 02:20:22,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=316980.0, ans=0.125 2024-08-10 02:20:42,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2024-08-10 02:20:59,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=317280.0, ans=0.0 2024-08-10 02:21:07,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2750, loss[loss=0.1432, beats_loss=0.008706, ecapa_loss=0.0003432, whisper_loss=0.131, over 16283.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01219, ecapa_loss=0.0002978, whisper_loss=0.1008, over 3858825.36 frames. ], batch size: 65, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:21:13,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=317380.0, ans=0.125 2024-08-10 02:21:14,566 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 02:21:34,978 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 02:21:54,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2024-08-10 02:22:19,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317780.0, ans=0.1 2024-08-10 02:22:21,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=317780.0, ans=0.0 2024-08-10 02:22:24,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2800, loss[loss=0.1021, beats_loss=0.01191, ecapa_loss=0.0002914, whisper_loss=0.08728, over 17974.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01216, ecapa_loss=0.0002997, whisper_loss=0.1002, over 3851050.84 frames. ], batch size: 71, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:22:29,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=317880.0, ans=0.125 2024-08-10 02:22:40,438 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 02:22:43,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.037e+01 3.440e+01 4.229e+01 1.125e+02, threshold=6.879e+01, percent-clipped=1.0 2024-08-10 02:22:53,873 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 02:23:01,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=22.5 2024-08-10 02:23:05,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318080.0, ans=0.125 2024-08-10 02:23:16,618 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 02:23:27,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2024-08-10 02:23:29,896 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 02:23:31,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318280.0, ans=0.125 2024-08-10 02:23:37,061 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 02:23:39,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2850, loss[loss=0.1415, beats_loss=0.0099, ecapa_loss=0.0003269, whisper_loss=0.1283, over 23870.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01231, ecapa_loss=0.000301, whisper_loss=0.09993, over 3859608.26 frames. ], batch size: 90, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:24:12,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=318580.0, ans=0.125 2024-08-10 02:24:21,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=318580.0, ans=0.125 2024-08-10 02:24:51,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=318780.0, ans=0.2 2024-08-10 02:24:52,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=318780.0, ans=0.1 2024-08-10 02:24:54,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=318780.0, ans=0.2 2024-08-10 02:24:58,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318780.0, ans=0.1 2024-08-10 02:25:01,642 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2900, loss[loss=0.1163, beats_loss=0.0142, ecapa_loss=0.0002556, whisper_loss=0.09957, over 18959.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01229, ecapa_loss=0.0003022, whisper_loss=0.1005, over 3865657.92 frames. ], batch size: 73, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:25:03,515 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 02:25:09,974 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 02:25:15,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=318980.0, ans=0.2 2024-08-10 02:25:17,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318980.0, ans=0.125 2024-08-10 02:25:19,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+01 2.971e+01 3.564e+01 4.159e+01 7.122e+01, threshold=7.127e+01, percent-clipped=1.0 2024-08-10 02:25:27,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=318980.0, ans=0.2 2024-08-10 02:25:33,689 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 02:25:36,900 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.668e+00 2024-08-10 02:25:37,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2024-08-10 02:25:46,215 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 02:25:52,864 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 02:26:05,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319280.0, ans=0.1 2024-08-10 02:26:12,331 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 02:26:17,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 2950, loss[loss=0.1264, beats_loss=0.01084, ecapa_loss=0.0002459, whisper_loss=0.1131, over 16893.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01228, ecapa_loss=0.0003023, whisper_loss=0.1006, over 3882030.56 frames. ], batch size: 62, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:26:33,260 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-10 02:26:37,392 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 02:26:41,628 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 02:26:45,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319580.0, ans=0.1 2024-08-10 02:27:08,403 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 02:27:17,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=319780.0, ans=0.0 2024-08-10 02:27:23,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3000, loss[loss=0.1175, beats_loss=0.01249, ecapa_loss=0.0002915, whisper_loss=0.1021, over 22257.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.0124, ecapa_loss=0.0002996, whisper_loss=0.1, over 3869426.08 frames. ], batch size: 88, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:27:23,994 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 02:28:04,492 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on ASR_libri: loss=0.2772, beats_loss=0, ecapa_loss=0.0008938, whisper_loss=0.2682, over 922467.00 frames. 2024-08-10 02:28:22,902 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on SV_voxceleb1: loss=0.007832, beats_loss=0, ecapa_loss=0.0007832, whisper_loss=0, over 939242.00 frames. 2024-08-10 02:29:46,327 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1316, 1.6658, 1.9729, 2.1025], device='cuda:3') 2024-08-10 02:30:19,809 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on AT_audioset: loss=0.02861, beats_loss=0.02861, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 02:30:19,812 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 02:30:24,555 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.525e+00 2024-08-10 02:30:31,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=319880.0, ans=0.125 2024-08-10 02:30:32,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=319980.0, ans=0.0 2024-08-10 02:30:38,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=319980.0, ans=0.2 2024-08-10 02:30:38,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.871e+01 3.251e+01 3.853e+01 5.451e+01, threshold=6.502e+01, percent-clipped=0.0 2024-08-10 02:30:51,345 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 02:30:54,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=320080.0, ans=0.0 2024-08-10 02:30:54,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=320080.0, ans=0.2 2024-08-10 02:30:56,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=320080.0, ans=0.0 2024-08-10 02:31:14,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=320180.0, ans=0.125 2024-08-10 02:31:21,428 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 02:31:30,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3050, loss[loss=0.1001, beats_loss=0.01002, ecapa_loss=0.0003995, whisper_loss=0.08604, over 17041.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.0124, ecapa_loss=0.000299, whisper_loss=0.1005, over 3880336.18 frames. ], batch size: 69, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:31:55,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=320480.0, ans=0.125 2024-08-10 02:32:19,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=320680.0, ans=0.2 2024-08-10 02:32:26,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=320780.0, ans=0.0 2024-08-10 02:32:39,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3100, loss[loss=0.114, beats_loss=0.01428, ecapa_loss=0.0002762, whisper_loss=0.09699, over 22631.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01238, ecapa_loss=0.0002997, whisper_loss=0.1002, over 3855683.35 frames. ], batch size: 91, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:32:44,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=320880.0, ans=0.0 2024-08-10 02:32:51,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=320880.0, ans=0.125 2024-08-10 02:32:55,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=320980.0, ans=0.09899494936611666 2024-08-10 02:32:56,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.934e+01 3.353e+01 3.892e+01 7.432e+01, threshold=6.707e+01, percent-clipped=2.0 2024-08-10 02:32:56,853 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 02:33:06,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=321080.0, ans=0.1 2024-08-10 02:33:24,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2024-08-10 02:33:35,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=321280.0, ans=0.0 2024-08-10 02:33:46,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-10 02:33:48,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3150, loss[loss=0.1389, beats_loss=0.009114, ecapa_loss=0.0003982, whisper_loss=0.1258, over 22394.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01234, ecapa_loss=0.0002995, whisper_loss=0.1009, over 3857545.24 frames. ], batch size: 94, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:34:08,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2024-08-10 02:34:15,204 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 02:34:16,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=321580.0, ans=0.125 2024-08-10 02:34:28,695 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 02:34:30,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-08-10 02:34:33,998 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 02:34:57,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3200, loss[loss=0.09205, beats_loss=0.01206, ecapa_loss=0.0002624, whisper_loss=0.07737, over 17993.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01239, ecapa_loss=0.0003004, whisper_loss=0.1005, over 3863189.16 frames. ], batch size: 68, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:34:58,891 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 02:35:05,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=321880.0, ans=0.0 2024-08-10 02:35:08,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=321880.0, ans=22.5 2024-08-10 02:35:13,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 2.789e+01 3.261e+01 3.853e+01 5.155e+01, threshold=6.521e+01, percent-clipped=0.0 2024-08-10 02:35:17,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=12.0 2024-08-10 02:35:20,532 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 02:35:26,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=322080.0, ans=0.125 2024-08-10 02:35:40,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-08-10 02:35:49,039 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-10 02:35:49,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-08-10 02:35:53,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=322280.0, ans=0.0 2024-08-10 02:35:57,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-10 02:36:00,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322280.0, ans=0.1 2024-08-10 02:36:06,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3250, loss[loss=0.09168, beats_loss=0.01518, ecapa_loss=0.0003134, whisper_loss=0.07337, over 15875.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01242, ecapa_loss=0.0002987, whisper_loss=0.09998, over 3855038.67 frames. ], batch size: 66, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:36:06,888 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 02:36:07,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=322380.0, ans=0.125 2024-08-10 02:36:12,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=322380.0, ans=0.125 2024-08-10 02:36:13,721 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 02:36:27,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2024-08-10 02:36:42,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=322580.0, ans=0.0 2024-08-10 02:36:55,060 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 02:36:59,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=322680.0, ans=0.0 2024-08-10 02:37:02,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=322780.0, ans=0.0 2024-08-10 02:37:15,019 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3300, loss[loss=0.1031, beats_loss=0.01421, ecapa_loss=0.000263, whisper_loss=0.08621, over 19427.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01234, ecapa_loss=0.0002997, whisper_loss=0.1006, over 3874602.15 frames. ], batch size: 76, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:37:31,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.021e+01 3.431e+01 4.015e+01 7.071e+01, threshold=6.862e+01, percent-clipped=2.0 2024-08-10 02:37:37,413 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-10 02:37:38,830 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 02:37:40,241 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 02:37:57,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.36 vs. limit=10.0 2024-08-10 02:37:59,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=323180.0, ans=0.1 2024-08-10 02:38:12,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=323280.0, ans=0.05 2024-08-10 02:38:13,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=323280.0, ans=0.125 2024-08-10 02:38:13,835 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.137e+05 2024-08-10 02:38:23,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3350, loss[loss=0.1074, beats_loss=0.01442, ecapa_loss=0.0002408, whisper_loss=0.09058, over 22077.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01237, ecapa_loss=0.0002964, whisper_loss=0.1006, over 3870714.83 frames. ], batch size: 86, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:38:27,000 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 02:38:31,121 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 02:38:44,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=323480.0, ans=0.125 2024-08-10 02:38:46,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=323480.0, ans=0.0 2024-08-10 02:38:47,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2024-08-10 02:38:48,281 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 19 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-10 02:39:10,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-10 02:39:17,890 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 02:39:22,168 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:39:23,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=323780.0, ans=0.2 2024-08-10 02:39:28,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=323780.0, ans=0.125 2024-08-10 02:39:31,269 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3400, loss[loss=0.09285, beats_loss=0.01562, ecapa_loss=0.0002997, whisper_loss=0.07423, over 19049.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01241, ecapa_loss=0.000294, whisper_loss=0.1003, over 3910011.03 frames. ], batch size: 83, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:39:35,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=323880.0, ans=0.125 2024-08-10 02:39:47,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 2.812e+01 3.293e+01 3.899e+01 6.283e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-10 02:39:53,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=323980.0, ans=0.025 2024-08-10 02:40:00,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=324080.0, ans=0.125 2024-08-10 02:40:08,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=324080.0, ans=0.0 2024-08-10 02:40:33,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=324280.0, ans=0.125 2024-08-10 02:40:39,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3450, loss[loss=0.1474, beats_loss=0.01027, ecapa_loss=0.0002724, whisper_loss=0.1344, over 21617.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01243, ecapa_loss=0.0002931, whisper_loss=0.1002, over 3891320.64 frames. ], batch size: 82, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:40:46,535 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 02:40:51,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.05 vs. limit=15.0 2024-08-10 02:40:56,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=324480.0, ans=0.2 2024-08-10 02:41:23,110 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 02:41:32,805 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 02:41:33,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=324680.0, ans=0.125 2024-08-10 02:41:35,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=324780.0, ans=0.0 2024-08-10 02:41:38,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=324780.0, ans=0.0 2024-08-10 02:41:48,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3500, loss[loss=0.12, beats_loss=0.01262, ecapa_loss=0.0002215, whisper_loss=0.1052, over 21686.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01245, ecapa_loss=0.0002941, whisper_loss=0.1002, over 3893158.08 frames. ], batch size: 83, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:41:50,239 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 02:41:54,742 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 02:41:56,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=324880.0, ans=0.5 2024-08-10 02:42:04,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=324980.0, ans=0.125 2024-08-10 02:42:05,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 3.058e+01 3.643e+01 4.338e+01 7.554e+01, threshold=7.285e+01, percent-clipped=1.0 2024-08-10 02:42:15,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=325080.0, ans=0.125 2024-08-10 02:42:24,861 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 02:42:26,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325080.0, ans=0.125 2024-08-10 02:42:27,572 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 02:42:31,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=325180.0, ans=0.05 2024-08-10 02:42:48,336 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 02:42:57,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3550, loss[loss=0.1163, beats_loss=0.009858, ecapa_loss=0.0003037, whisper_loss=0.1034, over 16845.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.0125, ecapa_loss=0.0002939, whisper_loss=0.09938, over 3898521.52 frames. ], batch size: 65, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:42:58,059 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-10 02:42:59,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=325380.0, ans=0.05 2024-08-10 02:43:16,109 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 02:43:30,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=325580.0, ans=0.125 2024-08-10 02:43:48,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=325680.0, ans=0.125 2024-08-10 02:43:49,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=325680.0, ans=0.125 2024-08-10 02:43:55,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=325780.0, ans=0.125 2024-08-10 02:44:02,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=325780.0, ans=0.125 2024-08-10 02:44:07,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3600, loss[loss=0.09575, beats_loss=0.01344, ecapa_loss=0.0002251, whisper_loss=0.08006, over 15430.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01244, ecapa_loss=0.0002961, whisper_loss=0.09901, over 3880617.16 frames. ], batch size: 57, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:44:16,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=325880.0, ans=0.2 2024-08-10 02:44:20,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2024-08-10 02:44:23,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.942e+01 3.351e+01 3.815e+01 6.062e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 02:45:17,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3650, loss[loss=0.119, beats_loss=0.0131, ecapa_loss=0.0002593, whisper_loss=0.1033, over 23617.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01241, ecapa_loss=0.0002971, whisper_loss=0.09874, over 3864351.40 frames. ], batch size: 90, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:45:21,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=326380.0, ans=0.0 2024-08-10 02:45:29,815 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 02:45:42,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2024-08-10 02:45:51,604 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 02:45:57,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=326680.0, ans=0.125 2024-08-10 02:46:04,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.19 vs. limit=10.0 2024-08-10 02:46:12,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=12.0 2024-08-10 02:46:14,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-10 02:46:25,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3700, loss[loss=0.1092, beats_loss=0.0108, ecapa_loss=0.0003092, whisper_loss=0.09534, over 20508.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01248, ecapa_loss=0.0002977, whisper_loss=0.09823, over 3862349.67 frames. ], batch size: 84, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:46:42,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.947e+01 3.360e+01 4.039e+01 7.794e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 02:46:42,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=326980.0, ans=0.2 2024-08-10 02:46:44,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=326980.0, ans=0.2 2024-08-10 02:46:45,325 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 02:46:57,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=327080.0, ans=0.0 2024-08-10 02:46:57,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=327080.0, ans=0.125 2024-08-10 02:47:15,359 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 02:47:20,846 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 02:47:22,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=327280.0, ans=0.125 2024-08-10 02:47:25,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-08-10 02:47:33,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3750, loss[loss=0.1058, beats_loss=0.01495, ecapa_loss=0.000277, whisper_loss=0.08811, over 17008.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01261, ecapa_loss=0.0002944, whisper_loss=0.09835, over 3863457.36 frames. ], batch size: 70, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:47:41,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-10 02:47:49,304 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 02:47:50,645 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 02:48:08,522 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 02:48:11,317 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 02:48:17,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=327680.0, ans=0.125 2024-08-10 02:48:19,906 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:48:28,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=327780.0, ans=0.0 2024-08-10 02:48:32,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=327780.0, ans=0.05 2024-08-10 02:48:32,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=327780.0, ans=0.125 2024-08-10 02:48:33,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=327780.0, ans=0.0 2024-08-10 02:48:42,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3800, loss[loss=0.1244, beats_loss=0.008173, ecapa_loss=0.0002865, whisper_loss=0.1134, over 18795.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01256, ecapa_loss=0.0002951, whisper_loss=0.09876, over 3879205.85 frames. ], batch size: 69, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:48:50,974 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 02:48:53,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=327880.0, ans=0.0 2024-08-10 02:48:58,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.560e+01 3.072e+01 3.520e+01 3.991e+01 6.360e+01, threshold=7.040e+01, percent-clipped=0.0 2024-08-10 02:49:08,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-08-10 02:49:09,007 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 02:49:14,477 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 02:49:14,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=328080.0, ans=0.125 2024-08-10 02:49:14,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328080.0, ans=0.1 2024-08-10 02:49:18,692 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 02:49:41,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.64 vs. limit=10.0 2024-08-10 02:49:44,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=328280.0, ans=0.2 2024-08-10 02:49:51,925 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3850, loss[loss=0.141, beats_loss=0.01132, ecapa_loss=0.0002768, whisper_loss=0.1269, over 17344.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01254, ecapa_loss=0.0002948, whisper_loss=0.09885, over 3858895.20 frames. ], batch size: 65, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:50:02,784 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 02:50:08,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=328480.0, ans=0.09899494936611666 2024-08-10 02:50:14,137 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 02:50:17,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=328480.0, ans=0.125 2024-08-10 02:50:19,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=328580.0, ans=0.95 2024-08-10 02:50:20,737 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 02:50:23,357 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-10 02:50:31,569 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 02:50:31,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=328680.0, ans=0.125 2024-08-10 02:50:36,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=328680.0, ans=0.125 2024-08-10 02:50:46,466 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 02:50:50,532 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 14 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 02:50:52,963 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 02:50:54,283 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 02:50:56,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2024-08-10 02:50:59,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3900, loss[loss=0.1292, beats_loss=0.01232, ecapa_loss=0.0003184, whisper_loss=0.1137, over 22886.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01253, ecapa_loss=0.0002973, whisper_loss=0.09924, over 3890329.83 frames. ], batch size: 92, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:51:02,088 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 02:51:09,181 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 02:51:10,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=328880.0, ans=0.125 2024-08-10 02:51:15,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.142e+01 3.558e+01 4.007e+01 5.949e+01, threshold=7.115e+01, percent-clipped=0.0 2024-08-10 02:51:17,270 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 02:51:21,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=328980.0, ans=0.125 2024-08-10 02:51:24,267 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:51:25,298 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 02:51:26,539 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 02:51:28,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2024-08-10 02:51:29,423 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 02:51:33,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=329080.0, ans=0.125 2024-08-10 02:51:37,725 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 02:52:01,947 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-10 02:52:06,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 3950, loss[loss=0.08976, beats_loss=0.009105, ecapa_loss=0.0003842, whisper_loss=0.07681, over 20009.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01247, ecapa_loss=0.0002998, whisper_loss=0.1, over 3913466.49 frames. ], batch size: 82, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:52:23,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=15.0 2024-08-10 02:52:33,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2024-08-10 02:52:34,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=329580.0, ans=0.125 2024-08-10 02:52:39,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=329580.0, ans=0.125 2024-08-10 02:52:40,419 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 02:52:46,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=329680.0, ans=0.0 2024-08-10 02:52:48,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=329680.0, ans=0.04949747468305833 2024-08-10 02:52:51,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=329680.0, ans=0.1 2024-08-10 02:52:52,748 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 02:53:04,560 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 02:53:10,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329780.0, ans=0.1 2024-08-10 02:53:13,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4000, loss[loss=0.1276, beats_loss=0.009342, ecapa_loss=0.0003521, whisper_loss=0.1147, over 20020.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01236, ecapa_loss=0.0003014, whisper_loss=0.1004, over 3944772.79 frames. ], batch size: 82, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:53:22,092 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 02:53:22,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=329880.0, ans=0.125 2024-08-10 02:53:29,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=329980.0, ans=0.125 2024-08-10 02:53:30,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 3.066e+01 3.430e+01 3.923e+01 5.367e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-10 02:53:38,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=329980.0, ans=0.125 2024-08-10 02:53:49,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=330080.0, ans=0.05 2024-08-10 02:54:04,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330180.0, ans=0.1 2024-08-10 02:54:06,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330180.0, ans=0.1 2024-08-10 02:54:09,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2024-08-10 02:54:11,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330280.0, ans=0.1 2024-08-10 02:54:21,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4050, loss[loss=0.1186, beats_loss=0.01313, ecapa_loss=0.0001847, whisper_loss=0.1036, over 19469.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01224, ecapa_loss=0.0003023, whisper_loss=0.1008, over 3925733.55 frames. ], batch size: 71, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:54:23,522 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 02:54:32,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2024-08-10 02:54:44,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=330480.0, ans=10.0 2024-08-10 02:54:45,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=330480.0, ans=0.0 2024-08-10 02:55:04,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330680.0, ans=0.1 2024-08-10 02:55:06,494 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 02:55:10,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2024-08-10 02:55:13,082 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-10 02:55:21,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=12.0 2024-08-10 02:55:28,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4100, loss[loss=0.09397, beats_loss=0.01691, ecapa_loss=0.0002904, whisper_loss=0.07416, over 15034.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01226, ecapa_loss=0.0003012, whisper_loss=0.1008, over 3909526.52 frames. ], batch size: 62, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:55:30,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=330880.0, ans=0.1 2024-08-10 02:55:39,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330880.0, ans=0.1 2024-08-10 02:55:44,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.941e+01 3.171e+01 3.928e+01 6.026e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 02:55:45,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2024-08-10 02:56:09,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=331180.0, ans=0.2 2024-08-10 02:56:12,231 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 02:56:17,551 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 02:56:20,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=331180.0, ans=0.0 2024-08-10 02:56:33,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=331280.0, ans=0.125 2024-08-10 02:56:35,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4150, loss[loss=0.09784, beats_loss=0.01208, ecapa_loss=0.0003643, whisper_loss=0.08212, over 14043.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01237, ecapa_loss=0.0002993, whisper_loss=0.1008, over 3903232.22 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:56:36,493 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:56:44,139 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-10 02:56:44,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331380.0, ans=0.1 2024-08-10 02:56:57,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2024-08-10 02:57:03,040 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 02:57:03,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-08-10 02:57:15,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=331680.0, ans=0.125 2024-08-10 02:57:20,328 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-10 02:57:36,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331780.0, ans=0.1 2024-08-10 02:57:41,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=331880.0, ans=10.0 2024-08-10 02:57:42,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4200, loss[loss=0.1373, beats_loss=0.01022, ecapa_loss=0.0002908, whisper_loss=0.1242, over 23217.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01235, ecapa_loss=0.0002997, whisper_loss=0.1006, over 3899555.78 frames. ], batch size: 88, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:57:44,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=331880.0, ans=0.0 2024-08-10 02:57:58,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.978e+01 3.511e+01 4.145e+01 7.481e+01, threshold=7.022e+01, percent-clipped=3.0 2024-08-10 02:58:32,250 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 02:58:33,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=332180.0, ans=0.125 2024-08-10 02:58:48,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332280.0, ans=0.1 2024-08-10 02:58:50,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4250, loss[loss=0.1188, beats_loss=0.01247, ecapa_loss=0.000239, whisper_loss=0.1039, over 17439.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01227, ecapa_loss=0.0002992, whisper_loss=0.1003, over 3884234.34 frames. ], batch size: 66, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:58:53,874 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.573e+03 2024-08-10 02:59:02,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=18.09 vs. limit=15.0 2024-08-10 02:59:18,809 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 02:59:24,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-10 02:59:36,401 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 02:59:38,945 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 02:59:39,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=332680.0, ans=0.125 2024-08-10 02:59:51,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=332780.0, ans=0.125 2024-08-10 02:59:52,707 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 02:59:59,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4300, loss[loss=0.112, beats_loss=0.01217, ecapa_loss=0.0003231, whisper_loss=0.09656, over 19645.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01234, ecapa_loss=0.0002956, whisper_loss=0.09965, over 3880899.69 frames. ], batch size: 76, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 03:00:02,536 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 03:00:10,507 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 03:00:13,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=332980.0, ans=0.125 2024-08-10 03:00:15,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.965e+01 3.329e+01 3.837e+01 6.258e+01, threshold=6.658e+01, percent-clipped=0.0 2024-08-10 03:00:26,868 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.938e+01 2024-08-10 03:00:30,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=333080.0, ans=0.2 2024-08-10 03:00:33,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=333080.0, ans=0.05 2024-08-10 03:00:34,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2024-08-10 03:00:46,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2024-08-10 03:00:56,078 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 03:01:06,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4350, loss[loss=0.1105, beats_loss=0.01583, ecapa_loss=0.0002428, whisper_loss=0.09223, over 21782.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01241, ecapa_loss=0.0002937, whisper_loss=0.09831, over 3860603.66 frames. ], batch size: 88, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:01:08,085 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 03:01:13,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=15.0 2024-08-10 03:01:33,667 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 03:01:52,996 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 03:02:03,785 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 03:02:13,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4400, loss[loss=0.1399, beats_loss=0.00896, ecapa_loss=0.0003737, whisper_loss=0.1272, over 21662.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01233, ecapa_loss=0.000294, whisper_loss=0.09884, over 3850524.42 frames. ], batch size: 87, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:02:16,720 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 03:02:21,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=15.0 2024-08-10 03:02:30,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.919e+01 3.248e+01 3.746e+01 6.587e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 03:02:32,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333980.0, ans=0.1 2024-08-10 03:02:38,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=333980.0, ans=0.125 2024-08-10 03:02:38,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=333980.0, ans=0.125 2024-08-10 03:02:40,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2024-08-10 03:02:41,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=334080.0, ans=0.1 2024-08-10 03:03:13,905 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 03:03:16,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334280.0, ans=0.1 2024-08-10 03:03:22,140 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4450, loss[loss=0.1091, beats_loss=0.01269, ecapa_loss=0.0003608, whisper_loss=0.09281, over 21812.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01234, ecapa_loss=0.000293, whisper_loss=0.09821, over 3835394.14 frames. ], batch size: 91, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:03:31,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334380.0, ans=0.1 2024-08-10 03:04:10,924 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-10 03:04:33,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334780.0, ans=0.1 2024-08-10 03:04:33,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=334780.0, ans=0.2 2024-08-10 03:04:35,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4500, loss[loss=0.1213, beats_loss=0.01426, ecapa_loss=0.0002637, whisper_loss=0.1044, over 15075.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01233, ecapa_loss=0.0002942, whisper_loss=0.09908, over 3844681.98 frames. ], batch size: 58, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:04:43,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2024-08-10 03:04:52,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+01 3.021e+01 3.497e+01 4.022e+01 7.846e+01, threshold=6.995e+01, percent-clipped=4.0 2024-08-10 03:05:03,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=335080.0, ans=0.125 2024-08-10 03:05:09,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=335080.0, ans=10.0 2024-08-10 03:05:11,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335080.0, ans=0.1 2024-08-10 03:05:21,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=15.0 2024-08-10 03:05:29,723 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 03:05:30,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=335180.0, ans=0.125 2024-08-10 03:05:32,638 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 03:05:38,243 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 03:05:46,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4550, loss[loss=0.1105, beats_loss=0.01365, ecapa_loss=0.0002846, whisper_loss=0.09401, over 15045.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01234, ecapa_loss=0.0002959, whisper_loss=0.09959, over 3867605.74 frames. ], batch size: 61, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:05:48,531 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 03:05:48,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=335380.0, ans=0.2 2024-08-10 03:05:56,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=335380.0, ans=0.025 2024-08-10 03:05:59,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=335380.0, ans=0.125 2024-08-10 03:06:32,032 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 03:06:45,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=335780.0, ans=0.125 2024-08-10 03:06:48,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335780.0, ans=0.125 2024-08-10 03:06:48,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=335780.0, ans=0.125 2024-08-10 03:06:50,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335780.0, ans=0.1 2024-08-10 03:06:58,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4600, loss[loss=0.1126, beats_loss=0.01309, ecapa_loss=0.0003101, whisper_loss=0.09637, over 19906.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01236, ecapa_loss=0.0002934, whisper_loss=0.09879, over 3870813.53 frames. ], batch size: 82, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:07:00,521 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.283e-01 2024-08-10 03:07:01,582 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 03:07:14,317 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-10 03:07:15,445 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.246e+01 3.685e+01 4.349e+01 7.107e+01, threshold=7.370e+01, percent-clipped=1.0 2024-08-10 03:07:18,691 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 03:07:20,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=335980.0, ans=0.125 2024-08-10 03:07:37,825 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 03:07:38,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336080.0, ans=0.125 2024-08-10 03:07:42,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=336180.0, ans=0.0 2024-08-10 03:07:46,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=336180.0, ans=0.1 2024-08-10 03:07:49,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=336180.0, ans=0.0 2024-08-10 03:07:50,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-08-10 03:07:55,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=336280.0, ans=0.125 2024-08-10 03:08:00,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336280.0, ans=0.125 2024-08-10 03:08:10,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4650, loss[loss=0.1161, beats_loss=0.01364, ecapa_loss=0.0002299, whisper_loss=0.1002, over 19733.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01249, ecapa_loss=0.0002937, whisper_loss=0.09865, over 3881605.01 frames. ], batch size: 74, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:08:14,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-10 03:08:45,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=336580.0, ans=0.0 2024-08-10 03:08:56,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-10 03:09:04,728 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 03:09:05,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-10 03:09:07,259 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 03:09:08,666 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 03:09:22,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=336880.0, ans=0.125 2024-08-10 03:09:23,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4700, loss[loss=0.09948, beats_loss=0.01562, ecapa_loss=0.0002632, whisper_loss=0.08122, over 22432.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01246, ecapa_loss=0.0002962, whisper_loss=0.09952, over 3882468.15 frames. ], batch size: 91, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:09:29,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=336880.0, ans=0.125 2024-08-10 03:09:40,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 3.034e+01 3.372e+01 4.313e+01 2.367e+02, threshold=6.744e+01, percent-clipped=2.0 2024-08-10 03:09:56,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=337080.0, ans=0.125 2024-08-10 03:09:56,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-08-10 03:10:01,932 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 03:10:16,115 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 03:10:23,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=337280.0, ans=0.2 2024-08-10 03:10:26,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2024-08-10 03:10:34,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4750, loss[loss=0.1275, beats_loss=0.01227, ecapa_loss=0.0002835, whisper_loss=0.1124, over 20389.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01246, ecapa_loss=0.0002955, whisper_loss=0.09916, over 3871615.91 frames. ], batch size: 82, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:10:36,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=337380.0, ans=0.07 2024-08-10 03:10:46,332 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 03:10:49,551 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 03:10:55,270 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 03:10:59,171 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 03:11:01,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2024-08-10 03:11:11,396 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 03:11:14,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=337580.0, ans=0.125 2024-08-10 03:11:21,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=337680.0, ans=0.0 2024-08-10 03:11:37,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337780.0, ans=0.1 2024-08-10 03:11:39,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-08-10 03:11:47,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4800, loss[loss=0.1, beats_loss=0.01431, ecapa_loss=0.0002976, whisper_loss=0.08271, over 19578.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01251, ecapa_loss=0.0002954, whisper_loss=0.09898, over 3897828.81 frames. ], batch size: 82, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:11:55,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=15.0 2024-08-10 03:12:00,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.44 vs. limit=6.0 2024-08-10 03:12:03,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.006e+01 3.324e+01 3.733e+01 5.524e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-10 03:12:07,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=337980.0, ans=0.125 2024-08-10 03:12:14,363 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.491e+00 2024-08-10 03:12:14,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=338080.0, ans=0.0 2024-08-10 03:12:48,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=12.0 2024-08-10 03:12:58,624 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4850, loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0002915, whisper_loss=0.09079, over 15710.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01254, ecapa_loss=0.0002941, whisper_loss=0.09949, over 3913226.59 frames. ], batch size: 61, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:13:20,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=338480.0, ans=0.0 2024-08-10 03:13:25,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2024-08-10 03:13:26,259 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 03:13:38,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.23 vs. limit=22.5 2024-08-10 03:13:54,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.48 vs. limit=15.0 2024-08-10 03:13:56,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=338780.0, ans=0.0 2024-08-10 03:14:05,801 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 03:14:11,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4900, loss[loss=0.1351, beats_loss=0.008156, ecapa_loss=0.0004208, whisper_loss=0.1228, over 14533.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01249, ecapa_loss=0.0002938, whisper_loss=0.09955, over 3895415.41 frames. ], batch size: 60, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:14:28,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 3.105e+01 3.477e+01 3.938e+01 7.192e+01, threshold=6.955e+01, percent-clipped=1.0 2024-08-10 03:14:34,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=338980.0, ans=0.125 2024-08-10 03:14:35,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=338980.0, ans=0.125 2024-08-10 03:14:42,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=339080.0, ans=10.0 2024-08-10 03:14:51,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339080.0, ans=0.1 2024-08-10 03:15:03,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=339180.0, ans=0.125 2024-08-10 03:15:12,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.71 vs. limit=22.5 2024-08-10 03:15:17,406 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 03:15:22,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 4950, loss[loss=0.1225, beats_loss=0.01033, ecapa_loss=0.0003236, whisper_loss=0.109, over 15131.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01243, ecapa_loss=0.0002934, whisper_loss=0.09959, over 3904936.51 frames. ], batch size: 60, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:15:28,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=339380.0, ans=0.0 2024-08-10 03:15:31,559 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 03:15:36,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=11.11 vs. limit=10.0 2024-08-10 03:15:41,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=339480.0, ans=0.0 2024-08-10 03:15:43,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=339480.0, ans=0.125 2024-08-10 03:15:49,951 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 03:15:51,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2024-08-10 03:15:53,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=339580.0, ans=0.125 2024-08-10 03:16:08,315 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 03:16:10,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=339680.0, ans=0.0 2024-08-10 03:16:12,283 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-10 03:16:16,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=339680.0, ans=0.0 2024-08-10 03:16:24,265 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 03:16:24,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=339780.0, ans=0.125 2024-08-10 03:16:35,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=339880.0, ans=0.0 2024-08-10 03:16:36,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5000, loss[loss=0.08773, beats_loss=0.0136, ecapa_loss=0.000347, whisper_loss=0.07067, over 17155.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01243, ecapa_loss=0.0002958, whisper_loss=0.09908, over 3887265.08 frames. ], batch size: 76, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:16:37,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.50 vs. limit=10.0 2024-08-10 03:16:46,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=339880.0, ans=0.125 2024-08-10 03:16:53,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.921e+01 3.372e+01 3.826e+01 7.563e+01, threshold=6.744e+01, percent-clipped=1.0 2024-08-10 03:17:01,145 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 03:17:02,691 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 03:17:05,638 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.433e-01 2024-08-10 03:17:11,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=340080.0, ans=0.0 2024-08-10 03:17:16,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=340080.0, ans=0.0 2024-08-10 03:17:41,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340280.0, ans=0.125 2024-08-10 03:17:43,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2024-08-10 03:17:46,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=340280.0, ans=0.125 2024-08-10 03:17:48,507 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5050, loss[loss=0.0955, beats_loss=0.01351, ecapa_loss=0.0003497, whisper_loss=0.07849, over 15258.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01242, ecapa_loss=0.0002972, whisper_loss=0.0992, over 3899779.72 frames. ], batch size: 61, lr: 1.95e-02, grad_scale: 4194304.0 2024-08-10 03:17:50,146 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 03:17:57,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340380.0, ans=0.1 2024-08-10 03:18:05,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-08-10 03:18:08,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=340480.0, ans=0.125 2024-08-10 03:18:09,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=340480.0, ans=0.125 2024-08-10 03:18:10,788 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 03:18:14,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=340480.0, ans=0.2 2024-08-10 03:18:36,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=340680.0, ans=0.0 2024-08-10 03:18:47,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=340780.0, ans=0.2 2024-08-10 03:18:50,359 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 03:19:01,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5100, loss[loss=0.1194, beats_loss=0.01132, ecapa_loss=0.0002222, whisper_loss=0.1059, over 18140.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01244, ecapa_loss=0.0002935, whisper_loss=0.09991, over 3916066.67 frames. ], batch size: 69, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:19:12,877 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 03:19:19,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.974e+01 3.405e+01 3.841e+01 8.729e+01, threshold=6.810e+01, percent-clipped=2.0 2024-08-10 03:19:29,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=340980.0, ans=0.125 2024-08-10 03:19:32,619 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 12 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 03:19:43,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=341080.0, ans=0.0 2024-08-10 03:19:57,024 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 03:20:06,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=341280.0, ans=0.0 2024-08-10 03:20:17,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5150, loss[loss=0.1123, beats_loss=0.01345, ecapa_loss=0.0003297, whisper_loss=0.09555, over 21596.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01245, ecapa_loss=0.0002921, whisper_loss=0.1003, over 3925376.31 frames. ], batch size: 92, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:20:34,543 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 03:21:13,575 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 35 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 03:21:18,422 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-10 03:21:32,999 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5200, loss[loss=0.1099, beats_loss=0.01376, ecapa_loss=0.0002814, whisper_loss=0.09333, over 16527.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01238, ecapa_loss=0.0002924, whisper_loss=0.1001, over 3895306.71 frames. ], batch size: 68, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:21:39,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=341880.0, ans=0.2 2024-08-10 03:21:51,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.930e+01 3.270e+01 3.996e+01 6.105e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-10 03:21:57,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=341980.0, ans=0.1 2024-08-10 03:21:59,039 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 03:22:09,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=342080.0, ans=0.125 2024-08-10 03:22:33,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=342280.0, ans=22.5 2024-08-10 03:22:45,581 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 03:22:46,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5250, loss[loss=0.1149, beats_loss=0.01205, ecapa_loss=0.0003595, whisper_loss=0.09928, over 20802.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.0124, ecapa_loss=0.0002946, whisper_loss=0.09876, over 3879345.73 frames. ], batch size: 89, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:22:47,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=342380.0, ans=0.125 2024-08-10 03:22:49,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-10 03:22:52,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-10 03:23:04,148 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 03:23:08,232 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 03:23:27,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=342580.0, ans=0.125 2024-08-10 03:23:35,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=342680.0, ans=0.0 2024-08-10 03:23:42,849 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 03:23:48,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=342780.0, ans=0.125 2024-08-10 03:23:50,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=342780.0, ans=0.2 2024-08-10 03:23:52,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=342780.0, ans=0.125 2024-08-10 03:24:02,514 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5300, loss[loss=0.1347, beats_loss=0.01245, ecapa_loss=0.0002991, whisper_loss=0.1193, over 22671.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01228, ecapa_loss=0.0002962, whisper_loss=0.09918, over 3877365.59 frames. ], batch size: 93, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:24:20,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.874e+01 3.315e+01 3.923e+01 7.752e+01, threshold=6.630e+01, percent-clipped=2.0 2024-08-10 03:24:21,906 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 03:24:23,170 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 03:24:26,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-08-10 03:24:30,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=343080.0, ans=0.125 2024-08-10 03:24:51,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=343180.0, ans=0.125 2024-08-10 03:24:57,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.94 vs. limit=15.0 2024-08-10 03:25:12,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=343280.0, ans=0.0 2024-08-10 03:25:15,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5350, loss[loss=0.105, beats_loss=0.01054, ecapa_loss=0.0002741, whisper_loss=0.09168, over 14585.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01223, ecapa_loss=0.000294, whisper_loss=0.09903, over 3849278.37 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:25:28,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.28 vs. limit=22.5 2024-08-10 03:25:38,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=343480.0, ans=0.125 2024-08-10 03:25:40,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2024-08-10 03:25:47,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=343580.0, ans=0.125 2024-08-10 03:25:55,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=343580.0, ans=0.125 2024-08-10 03:26:21,487 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 03:26:22,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=343780.0, ans=0.0 2024-08-10 03:26:32,424 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5400, loss[loss=0.09826, beats_loss=0.01594, ecapa_loss=0.0002798, whisper_loss=0.07952, over 18180.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01227, ecapa_loss=0.0002918, whisper_loss=0.09892, over 3880336.36 frames. ], batch size: 74, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:26:39,963 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 03:26:50,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.952e+01 3.404e+01 3.987e+01 5.856e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 03:26:53,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=343980.0, ans=0.125 2024-08-10 03:26:54,318 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 03:27:38,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=344280.0, ans=0.125 2024-08-10 03:27:45,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=344380.0, ans=0.125 2024-08-10 03:27:46,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5450, loss[loss=0.1198, beats_loss=0.01296, ecapa_loss=0.00029, whisper_loss=0.104, over 21755.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01238, ecapa_loss=0.000292, whisper_loss=0.09831, over 3876688.63 frames. ], batch size: 88, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:27:50,094 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 03:27:51,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=344380.0, ans=0.0 2024-08-10 03:27:51,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-08-10 03:28:14,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=6.0 2024-08-10 03:28:38,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=344680.0, ans=0.0 2024-08-10 03:28:42,791 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 03:28:51,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.88 vs. limit=15.0 2024-08-10 03:28:54,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-10 03:29:00,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344780.0, ans=0.1 2024-08-10 03:29:03,763 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5500, loss[loss=0.1304, beats_loss=0.01109, ecapa_loss=0.0003444, whisper_loss=0.1159, over 20741.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01243, ecapa_loss=0.0002941, whisper_loss=0.09841, over 3901345.59 frames. ], batch size: 85, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:29:06,429 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 03:29:12,360 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 03:29:15,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=344880.0, ans=0.0 2024-08-10 03:29:22,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.947e+01 3.297e+01 3.879e+01 5.625e+01, threshold=6.594e+01, percent-clipped=0.0 2024-08-10 03:29:22,332 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 03:29:25,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=344980.0, ans=0.0 2024-08-10 03:29:30,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=344980.0, ans=0.09899494936611666 2024-08-10 03:29:52,303 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 03:30:16,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-08-10 03:30:19,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5550, loss[loss=0.13, beats_loss=0.01051, ecapa_loss=0.0003275, whisper_loss=0.1163, over 21472.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01233, ecapa_loss=0.0002953, whisper_loss=0.0991, over 3920368.01 frames. ], batch size: 88, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:30:25,043 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 03:30:29,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=345380.0, ans=0.0 2024-08-10 03:30:53,077 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 03:31:34,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=345880.0, ans=0.0 2024-08-10 03:31:35,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5600, loss[loss=0.1093, beats_loss=0.0129, ecapa_loss=0.0003226, whisper_loss=0.09317, over 21548.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01243, ecapa_loss=0.0002933, whisper_loss=0.09908, over 3942882.55 frames. ], batch size: 88, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:31:39,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=345880.0, ans=0.04949747468305833 2024-08-10 03:31:39,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=345880.0, ans=0.125 2024-08-10 03:31:53,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.950e+01 3.327e+01 3.865e+01 5.194e+01, threshold=6.655e+01, percent-clipped=0.0 2024-08-10 03:32:00,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345980.0, ans=0.1 2024-08-10 03:32:21,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=346180.0, ans=0.07 2024-08-10 03:32:23,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346180.0, ans=0.1 2024-08-10 03:32:27,404 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-10 03:32:29,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=346180.0, ans=0.125 2024-08-10 03:32:30,397 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 03:32:32,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2024-08-10 03:32:39,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=346280.0, ans=0.125 2024-08-10 03:32:40,692 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 03:32:49,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5650, loss[loss=0.1244, beats_loss=0.01256, ecapa_loss=0.0002663, whisper_loss=0.1092, over 23001.00 frames. ], tot_loss[loss=0.114, beats_loss=0.0126, ecapa_loss=0.0002903, whisper_loss=0.09851, over 3985452.51 frames. ], batch size: 90, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:32:50,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=346380.0, ans=0.0 2024-08-10 03:33:00,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=346380.0, ans=0.125 2024-08-10 03:33:13,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.96 vs. limit=15.0 2024-08-10 03:33:26,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=346580.0, ans=0.125 2024-08-10 03:33:32,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=346580.0, ans=0.125 2024-08-10 03:33:33,284 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 03:33:33,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=346680.0, ans=0.0 2024-08-10 03:33:39,071 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 03:33:57,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=346780.0, ans=0.1 2024-08-10 03:33:59,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=346780.0, ans=0.0 2024-08-10 03:34:03,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=346780.0, ans=10.0 2024-08-10 03:34:05,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5700, loss[loss=0.08578, beats_loss=0.01579, ecapa_loss=0.000236, whisper_loss=0.06764, over 19417.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.0126, ecapa_loss=0.0002924, whisper_loss=0.09866, over 3970382.91 frames. ], batch size: 78, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:34:20,391 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 03:34:21,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=346980.0, ans=0.125 2024-08-10 03:34:23,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+01 2.923e+01 3.363e+01 4.122e+01 7.176e+01, threshold=6.726e+01, percent-clipped=2.0 2024-08-10 03:34:54,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=347180.0, ans=0.125 2024-08-10 03:35:22,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5750, loss[loss=0.1272, beats_loss=0.01156, ecapa_loss=0.0002761, whisper_loss=0.1128, over 23208.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01252, ecapa_loss=0.0002925, whisper_loss=0.09955, over 3976388.76 frames. ], batch size: 90, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:35:33,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=347380.0, ans=0.125 2024-08-10 03:35:58,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=347580.0, ans=10.0 2024-08-10 03:36:01,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=347580.0, ans=0.2 2024-08-10 03:36:08,288 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 03:36:08,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2024-08-10 03:36:11,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-08-10 03:36:12,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=347680.0, ans=0.125 2024-08-10 03:36:25,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=347780.0, ans=0.125 2024-08-10 03:36:27,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=347780.0, ans=0.125 2024-08-10 03:36:36,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5800, loss[loss=0.1293, beats_loss=0.01358, ecapa_loss=0.0002797, whisper_loss=0.1129, over 21617.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01257, ecapa_loss=0.0002917, whisper_loss=0.0993, over 3988024.53 frames. ], batch size: 87, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:36:37,751 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 03:36:44,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=347880.0, ans=0.125 2024-08-10 03:36:49,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=347880.0, ans=0.125 2024-08-10 03:36:54,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.824e+01 3.290e+01 3.735e+01 8.555e+01, threshold=6.581e+01, percent-clipped=2.0 2024-08-10 03:37:31,053 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 03:37:31,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=348180.0, ans=0.125 2024-08-10 03:37:40,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2024-08-10 03:37:41,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=348280.0, ans=0.0 2024-08-10 03:37:50,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5850, loss[loss=0.1199, beats_loss=0.01211, ecapa_loss=0.0003179, whisper_loss=0.1046, over 20504.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01251, ecapa_loss=0.0002932, whisper_loss=0.09946, over 3970374.87 frames. ], batch size: 84, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:37:56,720 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 03:37:58,412 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 03:38:27,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.04 vs. limit=22.5 2024-08-10 03:38:30,636 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 03:38:50,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=348780.0, ans=0.04949747468305833 2024-08-10 03:39:00,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5900, loss[loss=0.096, beats_loss=0.01629, ecapa_loss=0.0002229, whisper_loss=0.07748, over 23702.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01247, ecapa_loss=0.0002924, whisper_loss=0.09866, over 3945844.01 frames. ], batch size: 93, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:39:05,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=348880.0, ans=0.0 2024-08-10 03:39:16,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.929e+01 3.311e+01 3.794e+01 5.610e+01, threshold=6.621e+01, percent-clipped=0.0 2024-08-10 03:39:17,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=348980.0, ans=0.0 2024-08-10 03:39:23,963 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 03:39:38,609 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 03:39:46,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=349180.0, ans=0.0 2024-08-10 03:39:47,264 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 03:39:49,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-08-10 03:40:08,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=349380.0, ans=0.2 2024-08-10 03:40:09,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 5950, loss[loss=0.08531, beats_loss=0.01482, ecapa_loss=0.0002721, whisper_loss=0.06776, over 19204.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01249, ecapa_loss=0.0002926, whisper_loss=0.09888, over 3937416.52 frames. ], batch size: 79, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:40:22,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=349480.0, ans=0.0 2024-08-10 03:40:24,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=349480.0, ans=0.125 2024-08-10 03:40:27,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=349480.0, ans=0.05 2024-08-10 03:40:27,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2024-08-10 03:40:39,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=349580.0, ans=0.125 2024-08-10 03:40:49,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=349680.0, ans=0.125 2024-08-10 03:40:49,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=349680.0, ans=0.125 2024-08-10 03:40:54,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2024-08-10 03:41:10,842 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-10 03:41:18,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6000, loss[loss=0.124, beats_loss=0.01392, ecapa_loss=0.0002665, whisper_loss=0.1074, over 17123.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01252, ecapa_loss=0.0002918, whisper_loss=0.09816, over 3915242.11 frames. ], batch size: 69, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:41:18,609 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 03:41:56,871 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6685, 4.3496, 5.2007, 4.9432], device='cuda:3') 2024-08-10 03:41:57,770 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on ASR_libri: loss=0.2761, beats_loss=0, ecapa_loss=0.0008742, whisper_loss=0.2674, over 922467.00 frames. 2024-08-10 03:42:15,698 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on SV_voxceleb1: loss=0.007667, beats_loss=0, ecapa_loss=0.0007667, whisper_loss=0, over 939242.00 frames. 2024-08-10 03:44:14,944 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on AT_audioset: loss=0.0285, beats_loss=0.0285, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 03:44:14,948 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 03:44:22,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=349880.0, ans=0.125 2024-08-10 03:44:22,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=349880.0, ans=0.2 2024-08-10 03:44:32,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 3.043e+01 3.498e+01 4.267e+01 5.483e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-10 03:44:36,948 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 03:44:38,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=349980.0, ans=0.035 2024-08-10 03:44:42,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=350080.0, ans=0.125 2024-08-10 03:44:44,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=22.5 2024-08-10 03:44:53,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=350080.0, ans=0.125 2024-08-10 03:44:57,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=350180.0, ans=0.125 2024-08-10 03:45:12,455 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 03:45:18,306 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 03:45:26,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6050, loss[loss=0.1357, beats_loss=0.01009, ecapa_loss=0.0003382, whisper_loss=0.1222, over 22666.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01239, ecapa_loss=0.0002915, whisper_loss=0.09982, over 3905674.07 frames. ], batch size: 91, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:45:26,707 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 03:45:31,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=350380.0, ans=0.125 2024-08-10 03:45:35,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=350380.0, ans=0.125 2024-08-10 03:45:53,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.21 vs. limit=10.0 2024-08-10 03:45:53,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=350580.0, ans=10.0 2024-08-10 03:46:14,839 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 03:46:36,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6100, loss[loss=0.1084, beats_loss=0.01392, ecapa_loss=0.0002643, whisper_loss=0.09184, over 17201.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01239, ecapa_loss=0.0002911, whisper_loss=0.09949, over 3882603.07 frames. ], batch size: 69, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:46:42,332 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 03:46:51,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350980.0, ans=0.1 2024-08-10 03:46:53,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.971e+01 3.424e+01 4.102e+01 1.085e+02, threshold=6.848e+01, percent-clipped=1.0 2024-08-10 03:46:56,113 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 03:46:57,832 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:47:16,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=351080.0, ans=0.0 2024-08-10 03:47:34,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.32 vs. limit=15.0 2024-08-10 03:47:43,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=351280.0, ans=0.09899494936611666 2024-08-10 03:47:45,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6150, loss[loss=0.1249, beats_loss=0.01091, ecapa_loss=0.000306, whisper_loss=0.1109, over 23194.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01235, ecapa_loss=0.00029, whisper_loss=0.09926, over 3865356.20 frames. ], batch size: 92, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:47:48,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-10 03:48:07,345 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.122e-01 2024-08-10 03:48:08,731 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 03:48:10,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=351480.0, ans=0.125 2024-08-10 03:48:10,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351480.0, ans=0.1 2024-08-10 03:48:15,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=351580.0, ans=0.125 2024-08-10 03:48:21,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=351580.0, ans=0.125 2024-08-10 03:48:32,253 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 03:48:33,450 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 03:48:52,192 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 03:48:52,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=351780.0, ans=0.125 2024-08-10 03:48:54,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6200, loss[loss=0.116, beats_loss=0.01124, ecapa_loss=0.0003167, whisper_loss=0.1016, over 22982.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01239, ecapa_loss=0.0002912, whisper_loss=0.09885, over 3891947.80 frames. ], batch size: 94, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:49:11,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.480e+01 3.031e+01 3.409e+01 3.924e+01 5.999e+01, threshold=6.819e+01, percent-clipped=0.0 2024-08-10 03:49:15,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=351980.0, ans=0.125 2024-08-10 03:50:02,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6250, loss[loss=0.1151, beats_loss=0.01226, ecapa_loss=0.0002854, whisper_loss=0.09996, over 19247.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01236, ecapa_loss=0.0002908, whisper_loss=0.09958, over 3910541.91 frames. ], batch size: 74, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:50:14,037 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 03:50:15,522 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-10 03:50:20,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=352480.0, ans=0.125 2024-08-10 03:50:21,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2024-08-10 03:50:22,516 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 03:50:26,578 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 03:50:28,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=352480.0, ans=0.2 2024-08-10 03:50:50,748 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 03:51:10,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6300, loss[loss=0.1459, beats_loss=0.009019, ecapa_loss=0.0003724, whisper_loss=0.1331, over 20754.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01224, ecapa_loss=0.0002945, whisper_loss=0.1001, over 3902733.16 frames. ], batch size: 84, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:51:27,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 3.057e+01 3.444e+01 4.179e+01 1.718e+02, threshold=6.888e+01, percent-clipped=1.0 2024-08-10 03:51:45,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=353080.0, ans=0.125 2024-08-10 03:51:56,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=353180.0, ans=0.0 2024-08-10 03:51:56,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=353180.0, ans=0.125 2024-08-10 03:52:02,396 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 03:52:15,613 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-10 03:52:19,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6350, loss[loss=0.1165, beats_loss=0.01123, ecapa_loss=0.0003189, whisper_loss=0.102, over 20800.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01231, ecapa_loss=0.0002963, whisper_loss=0.09984, over 3896312.10 frames. ], batch size: 81, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:52:21,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=353380.0, ans=0.125 2024-08-10 03:52:24,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=353380.0, ans=0.125 2024-08-10 03:52:36,184 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 03:52:44,929 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 21 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-10 03:52:47,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=353580.0, ans=0.125 2024-08-10 03:52:59,701 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 03:53:02,433 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 03:53:09,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=353680.0, ans=0.125 2024-08-10 03:53:15,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=353780.0, ans=0.0 2024-08-10 03:53:17,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-10 03:53:20,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.00 vs. limit=10.0 2024-08-10 03:53:28,576 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6400, loss[loss=0.1177, beats_loss=0.01324, ecapa_loss=0.0002823, whisper_loss=0.1016, over 20126.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01233, ecapa_loss=0.0002952, whisper_loss=0.09956, over 3883296.26 frames. ], batch size: 80, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:53:31,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=353880.0, ans=0.125 2024-08-10 03:53:36,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=353880.0, ans=0.09899494936611666 2024-08-10 03:53:42,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=353980.0, ans=0.0 2024-08-10 03:53:44,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.873e+01 3.233e+01 3.602e+01 5.742e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-10 03:53:52,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-10 03:53:57,303 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 03:54:04,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=354080.0, ans=0.2 2024-08-10 03:54:14,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=354180.0, ans=0.125 2024-08-10 03:54:25,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=354280.0, ans=0.125 2024-08-10 03:54:36,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=354380.0, ans=0.0 2024-08-10 03:54:37,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6450, loss[loss=0.09443, beats_loss=0.01449, ecapa_loss=0.0003169, whisper_loss=0.07677, over 21396.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01236, ecapa_loss=0.0002946, whisper_loss=0.09909, over 3881582.41 frames. ], batch size: 94, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:54:42,718 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 03:54:48,654 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.319e+03 2024-08-10 03:54:51,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=354480.0, ans=0.07 2024-08-10 03:54:58,406 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 03:55:13,835 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 03:55:26,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=354680.0, ans=0.125 2024-08-10 03:55:27,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=354680.0, ans=0.125 2024-08-10 03:55:41,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=354780.0, ans=0.0 2024-08-10 03:55:46,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6500, loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0003018, whisper_loss=0.08952, over 18660.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01226, ecapa_loss=0.000295, whisper_loss=0.09957, over 3867579.14 frames. ], batch size: 74, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:55:50,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2024-08-10 03:55:54,251 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 03:55:56,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=354880.0, ans=12.0 2024-08-10 03:56:01,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=354980.0, ans=0.125 2024-08-10 03:56:02,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.914e+01 3.314e+01 3.758e+01 6.768e+01, threshold=6.629e+01, percent-clipped=1.0 2024-08-10 03:56:21,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=355080.0, ans=0.125 2024-08-10 03:56:33,242 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 03:56:34,762 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 03:56:44,019 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 03:56:48,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355280.0, ans=0.125 2024-08-10 03:56:53,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6550, loss[loss=0.1408, beats_loss=0.007017, ecapa_loss=0.0003056, whisper_loss=0.1307, over 16184.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.0122, ecapa_loss=0.0002959, whisper_loss=0.1009, over 3912196.43 frames. ], batch size: 60, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:57:11,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355480.0, ans=0.1 2024-08-10 03:57:24,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=355580.0, ans=0.125 2024-08-10 03:57:24,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-10 03:57:31,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2024-08-10 03:57:36,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355680.0, ans=0.125 2024-08-10 03:57:42,006 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.310e-01 2024-08-10 03:57:44,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=355680.0, ans=0.125 2024-08-10 03:58:01,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6600, loss[loss=0.1092, beats_loss=0.01283, ecapa_loss=0.0003342, whisper_loss=0.09306, over 19598.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.0122, ecapa_loss=0.0002973, whisper_loss=0.1004, over 3904873.59 frames. ], batch size: 84, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:58:02,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.32 vs. limit=15.0 2024-08-10 03:58:12,778 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 03:58:18,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.136e+01 3.510e+01 4.053e+01 6.821e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-10 03:58:33,387 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 03:58:36,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=356080.0, ans=0.5 2024-08-10 03:58:43,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=356180.0, ans=0.0 2024-08-10 03:59:02,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=356280.0, ans=0.125 2024-08-10 03:59:10,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6650, loss[loss=0.1173, beats_loss=0.01176, ecapa_loss=0.0002382, whisper_loss=0.1031, over 24280.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01221, ecapa_loss=0.0002943, whisper_loss=0.1003, over 3911737.79 frames. ], batch size: 93, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:59:13,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=356380.0, ans=0.025 2024-08-10 03:59:30,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-10 03:59:44,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-10 03:59:44,938 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 03:59:46,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=356580.0, ans=0.0 2024-08-10 04:00:01,219 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 04:00:19,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6700, loss[loss=0.1011, beats_loss=0.009651, ecapa_loss=0.0004292, whisper_loss=0.08713, over 16302.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01228, ecapa_loss=0.0002918, whisper_loss=0.09979, over 3918706.30 frames. ], batch size: 69, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:00:35,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.886e+01 3.265e+01 3.693e+01 7.385e+01, threshold=6.529e+01, percent-clipped=1.0 2024-08-10 04:00:36,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=356980.0, ans=0.125 2024-08-10 04:00:40,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=356980.0, ans=0.0 2024-08-10 04:00:53,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=357080.0, ans=0.035 2024-08-10 04:01:00,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=357180.0, ans=0.125 2024-08-10 04:01:11,976 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 04:01:25,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357280.0, ans=0.1 2024-08-10 04:01:28,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6750, loss[loss=0.1124, beats_loss=0.01291, ecapa_loss=0.0003073, whisper_loss=0.09637, over 22075.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01231, ecapa_loss=0.0002914, whisper_loss=0.09992, over 3937208.31 frames. ], batch size: 94, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:02:08,624 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 04:02:27,721 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 04:02:30,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=357780.0, ans=0.125 2024-08-10 04:02:37,207 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6800, loss[loss=0.1132, beats_loss=0.01342, ecapa_loss=0.0002667, whisper_loss=0.09708, over 21232.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01223, ecapa_loss=0.0002925, whisper_loss=0.1004, over 3925155.71 frames. ], batch size: 84, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:02:45,868 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 04:02:54,011 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.970e+01 3.321e+01 3.801e+01 1.301e+02, threshold=6.643e+01, percent-clipped=3.0 2024-08-10 04:03:04,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358080.0, ans=0.125 2024-08-10 04:03:06,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358080.0, ans=0.1 2024-08-10 04:03:33,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-08-10 04:03:46,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6850, loss[loss=0.1173, beats_loss=0.01245, ecapa_loss=0.0002894, whisper_loss=0.1019, over 23049.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.0123, ecapa_loss=0.0002906, whisper_loss=0.09971, over 3924490.62 frames. ], batch size: 93, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:03:52,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=358380.0, ans=0.125 2024-08-10 04:03:53,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=15.0 2024-08-10 04:04:03,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=358480.0, ans=0.125 2024-08-10 04:04:13,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=358580.0, ans=0.0 2024-08-10 04:04:19,531 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 04:04:21,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=358580.0, ans=0.125 2024-08-10 04:04:30,660 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-10 04:04:49,306 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 04:04:54,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6900, loss[loss=0.1475, beats_loss=0.0117, ecapa_loss=0.0002284, whisper_loss=0.1335, over 20422.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01234, ecapa_loss=0.0002901, whisper_loss=0.09959, over 3919462.57 frames. ], batch size: 77, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:05:10,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.982e+01 3.330e+01 3.890e+01 5.660e+01, threshold=6.660e+01, percent-clipped=0.0 2024-08-10 04:05:37,641 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 04:05:39,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=359180.0, ans=0.125 2024-08-10 04:05:43,069 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 04:05:59,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=359280.0, ans=0.0 2024-08-10 04:06:03,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 6950, loss[loss=0.1238, beats_loss=0.009785, ecapa_loss=0.0003305, whisper_loss=0.1107, over 17205.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01228, ecapa_loss=0.0002897, whisper_loss=0.09999, over 3866400.25 frames. ], batch size: 69, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:06:26,450 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 04:06:29,140 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 04:06:51,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359680.0, ans=0.1 2024-08-10 04:06:56,889 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 04:06:58,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=359780.0, ans=0.2 2024-08-10 04:07:13,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7000, loss[loss=0.1416, beats_loss=0.009286, ecapa_loss=0.0002882, whisper_loss=0.1295, over 23026.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01216, ecapa_loss=0.0002915, whisper_loss=0.1008, over 3857523.18 frames. ], batch size: 87, lr: 1.89e-02, grad_scale: 4194304.0 2024-08-10 04:07:20,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=359880.0, ans=0.125 2024-08-10 04:07:32,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.852e+01 3.263e+01 3.844e+01 5.295e+01, threshold=6.525e+01, percent-clipped=0.0 2024-08-10 04:07:38,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=359980.0, ans=0.125 2024-08-10 04:08:01,722 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 04:08:03,070 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 12 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-10 04:08:05,896 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-10 04:08:13,623 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 04:08:15,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=360280.0, ans=0.125 2024-08-10 04:08:19,240 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 04:08:23,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=360380.0, ans=0.125 2024-08-10 04:08:24,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7050, loss[loss=0.1227, beats_loss=0.0127, ecapa_loss=0.0002218, whisper_loss=0.1078, over 17192.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01225, ecapa_loss=0.0002894, whisper_loss=0.1007, over 3879886.00 frames. ], batch size: 63, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:08:26,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=360380.0, ans=0.125 2024-08-10 04:08:34,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=360380.0, ans=15.0 2024-08-10 04:08:39,787 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 04:08:41,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=360480.0, ans=0.125 2024-08-10 04:08:41,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.46 vs. limit=22.5 2024-08-10 04:08:59,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360580.0, ans=0.1 2024-08-10 04:09:03,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-10 04:09:14,317 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 04:09:14,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=360680.0, ans=0.125 2024-08-10 04:09:32,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7100, loss[loss=0.1324, beats_loss=0.01103, ecapa_loss=0.0002717, whisper_loss=0.1187, over 15344.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01228, ecapa_loss=0.0002863, whisper_loss=0.1004, over 3872107.95 frames. ], batch size: 59, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:09:36,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.37 vs. limit=15.0 2024-08-10 04:09:48,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-08-10 04:09:48,990 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.064e+01 3.569e+01 4.090e+01 1.167e+02, threshold=7.137e+01, percent-clipped=2.0 2024-08-10 04:09:55,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=360980.0, ans=0.1 2024-08-10 04:10:07,250 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-10 04:10:07,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361080.0, ans=0.1 2024-08-10 04:10:22,424 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-10 04:10:26,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=361280.0, ans=0.2 2024-08-10 04:10:30,810 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 04:10:41,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7150, loss[loss=0.1074, beats_loss=0.007911, ecapa_loss=0.0003612, whisper_loss=0.09588, over 14664.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01232, ecapa_loss=0.0002884, whisper_loss=0.09976, over 3887785.66 frames. ], batch size: 57, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:10:41,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=361380.0, ans=0.09899494936611666 2024-08-10 04:10:45,589 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 04:10:51,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-08-10 04:11:05,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=361480.0, ans=0.125 2024-08-10 04:11:18,065 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 04:11:20,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=361580.0, ans=0.125 2024-08-10 04:11:21,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=361580.0, ans=0.125 2024-08-10 04:11:26,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2024-08-10 04:11:30,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361680.0, ans=0.1 2024-08-10 04:11:41,665 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 04:11:50,804 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7200, loss[loss=0.1019, beats_loss=0.0135, ecapa_loss=0.0002047, whisper_loss=0.08635, over 23187.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01239, ecapa_loss=0.0002878, whisper_loss=0.09869, over 3869328.67 frames. ], batch size: 90, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:12:00,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361880.0, ans=0.1 2024-08-10 04:12:07,359 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.897e+01 3.291e+01 3.668e+01 6.348e+01, threshold=6.581e+01, percent-clipped=0.0 2024-08-10 04:12:14,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=361980.0, ans=0.0 2024-08-10 04:12:18,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-10 04:12:57,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=362280.0, ans=0.125 2024-08-10 04:13:01,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7250, loss[loss=0.1031, beats_loss=0.01245, ecapa_loss=0.0003134, whisper_loss=0.08748, over 22747.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.0123, ecapa_loss=0.000289, whisper_loss=0.09956, over 3883843.02 frames. ], batch size: 92, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:13:12,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2024-08-10 04:13:32,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=362580.0, ans=0.0 2024-08-10 04:13:33,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=362580.0, ans=0.07 2024-08-10 04:13:55,068 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:14:12,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7300, loss[loss=0.142, beats_loss=0.009671, ecapa_loss=0.0003499, whisper_loss=0.1288, over 19376.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01216, ecapa_loss=0.0002923, whisper_loss=0.1009, over 3868547.84 frames. ], batch size: 80, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:14:19,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362880.0, ans=0.125 2024-08-10 04:14:30,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.958e+01 3.364e+01 4.070e+01 6.476e+01, threshold=6.728e+01, percent-clipped=0.0 2024-08-10 04:14:38,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.20 vs. limit=15.0 2024-08-10 04:14:43,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2024-08-10 04:14:48,892 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 04:14:53,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-10 04:14:55,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2024-08-10 04:15:11,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=363280.0, ans=0.05 2024-08-10 04:15:23,026 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 04:15:24,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7350, loss[loss=0.1097, beats_loss=0.01431, ecapa_loss=0.0002475, whisper_loss=0.09292, over 15908.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01215, ecapa_loss=0.0002915, whisper_loss=0.1009, over 3891635.45 frames. ], batch size: 62, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:15:29,200 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 04:15:29,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363380.0, ans=0.1 2024-08-10 04:15:39,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=363480.0, ans=0.0 2024-08-10 04:15:55,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2024-08-10 04:15:57,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=363580.0, ans=15.0 2024-08-10 04:16:02,793 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 04:16:17,400 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 04:16:24,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=363780.0, ans=0.2 2024-08-10 04:16:24,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363780.0, ans=0.1 2024-08-10 04:16:33,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=363780.0, ans=0.125 2024-08-10 04:16:33,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=363780.0, ans=0.0 2024-08-10 04:16:33,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.66 vs. limit=10.0 2024-08-10 04:16:37,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7400, loss[loss=0.1206, beats_loss=0.01468, ecapa_loss=0.0002894, whisper_loss=0.103, over 22646.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01216, ecapa_loss=0.0002901, whisper_loss=0.101, over 3902765.61 frames. ], batch size: 91, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:16:39,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2024-08-10 04:16:41,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=363880.0, ans=0.125 2024-08-10 04:16:47,404 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 18 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 04:16:53,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363980.0, ans=0.1 2024-08-10 04:16:54,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.042e+01 3.418e+01 4.034e+01 8.204e+01, threshold=6.837e+01, percent-clipped=2.0 2024-08-10 04:16:56,590 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 04:17:00,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363980.0, ans=0.1 2024-08-10 04:17:13,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=364080.0, ans=0.0 2024-08-10 04:17:15,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=364080.0, ans=0.125 2024-08-10 04:17:22,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.86 vs. limit=15.0 2024-08-10 04:17:30,311 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 04:17:31,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=364180.0, ans=0.0 2024-08-10 04:17:43,652 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 04:17:45,784 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 04:17:49,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7450, loss[loss=0.1082, beats_loss=0.01335, ecapa_loss=0.0002726, whisper_loss=0.09213, over 22305.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01227, ecapa_loss=0.0002895, whisper_loss=0.1006, over 3923256.15 frames. ], batch size: 91, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:17:49,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=364380.0, ans=0.125 2024-08-10 04:17:51,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=364380.0, ans=0.0 2024-08-10 04:18:08,792 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-10 04:18:25,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=364580.0, ans=0.125 2024-08-10 04:18:31,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=364580.0, ans=0.0 2024-08-10 04:18:53,330 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 04:18:56,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=364780.0, ans=0.2 2024-08-10 04:19:00,723 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-10 04:19:03,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7500, loss[loss=0.1225, beats_loss=0.01164, ecapa_loss=0.0003219, whisper_loss=0.1076, over 17460.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01231, ecapa_loss=0.0002886, whisper_loss=0.09994, over 3888435.95 frames. ], batch size: 72, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:19:10,830 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 04:19:20,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.956e+01 3.355e+01 3.815e+01 8.528e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 04:19:40,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=365080.0, ans=0.2 2024-08-10 04:20:01,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-08-10 04:20:06,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.06 vs. limit=10.0 2024-08-10 04:20:07,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=365280.0, ans=0.125 2024-08-10 04:20:09,736 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-10 04:20:16,887 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7550, loss[loss=0.1467, beats_loss=0.009291, ecapa_loss=0.0003281, whisper_loss=0.1342, over 18486.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01236, ecapa_loss=0.0002896, whisper_loss=0.09973, over 3876096.70 frames. ], batch size: 71, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:20:26,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=365380.0, ans=0.0 2024-08-10 04:20:27,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=365380.0, ans=0.125 2024-08-10 04:20:30,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=365480.0, ans=0.125 2024-08-10 04:20:31,665 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 04:20:49,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365580.0, ans=0.1 2024-08-10 04:20:56,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-10 04:21:15,201 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 04:21:30,130 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7600, loss[loss=0.09727, beats_loss=0.01141, ecapa_loss=0.0004092, whisper_loss=0.08177, over 16770.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01228, ecapa_loss=0.0002932, whisper_loss=0.09992, over 3848005.00 frames. ], batch size: 72, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:21:46,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 3.083e+01 3.503e+01 3.988e+01 6.295e+01, threshold=7.005e+01, percent-clipped=0.0 2024-08-10 04:21:52,218 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 04:22:30,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=366280.0, ans=15.0 2024-08-10 04:22:33,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=366280.0, ans=0.125 2024-08-10 04:22:41,507 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 04:22:42,560 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7650, loss[loss=0.1083, beats_loss=0.01365, ecapa_loss=0.0003003, whisper_loss=0.09169, over 22797.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01219, ecapa_loss=0.0002922, whisper_loss=0.1001, over 3829646.41 frames. ], batch size: 95, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:22:42,802 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 04:22:44,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=366380.0, ans=0.125 2024-08-10 04:22:47,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.76 vs. limit=10.0 2024-08-10 04:22:49,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=366380.0, ans=0.125 2024-08-10 04:22:58,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=366480.0, ans=0.125 2024-08-10 04:23:00,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=366480.0, ans=0.125 2024-08-10 04:23:10,748 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 04:23:15,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=366580.0, ans=0.025 2024-08-10 04:23:18,891 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 04:23:23,543 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-10 04:23:29,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-10 04:23:34,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=366680.0, ans=0.0 2024-08-10 04:23:54,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7700, loss[loss=0.1251, beats_loss=0.0111, ecapa_loss=0.0003279, whisper_loss=0.1107, over 23038.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01221, ecapa_loss=0.0002944, whisper_loss=0.09958, over 3844349.31 frames. ], batch size: 93, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:23:54,500 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 04:24:04,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2024-08-10 04:24:12,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.962e+01 3.373e+01 3.972e+01 7.552e+01, threshold=6.745e+01, percent-clipped=1.0 2024-08-10 04:24:22,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=367080.0, ans=0.125 2024-08-10 04:24:22,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=367080.0, ans=0.125 2024-08-10 04:24:24,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=367080.0, ans=0.125 2024-08-10 04:24:31,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=367080.0, ans=0.1 2024-08-10 04:24:52,941 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 04:24:54,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=367280.0, ans=0.2 2024-08-10 04:24:56,909 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 04:24:58,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=367280.0, ans=0.125 2024-08-10 04:25:04,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=367280.0, ans=0.1 2024-08-10 04:25:06,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7750, loss[loss=0.1373, beats_loss=0.01148, ecapa_loss=0.0002184, whisper_loss=0.1236, over 24073.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01229, ecapa_loss=0.0002917, whisper_loss=0.09952, over 3846185.10 frames. ], batch size: 90, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:25:07,037 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 04:25:13,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=367380.0, ans=0.125 2024-08-10 04:25:27,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=367480.0, ans=0.125 2024-08-10 04:25:34,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2024-08-10 04:25:36,739 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 04:25:43,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.89 vs. limit=22.5 2024-08-10 04:25:48,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367680.0, ans=0.1 2024-08-10 04:26:00,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=15.0 2024-08-10 04:26:01,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-10 04:26:05,286 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 04:26:11,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=367780.0, ans=0.0 2024-08-10 04:26:13,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=367780.0, ans=0.025 2024-08-10 04:26:18,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7800, loss[loss=0.1252, beats_loss=0.009197, ecapa_loss=0.000296, whisper_loss=0.1131, over 16199.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01234, ecapa_loss=0.0002891, whisper_loss=0.09954, over 3843876.97 frames. ], batch size: 62, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:26:24,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=367880.0, ans=0.2 2024-08-10 04:26:25,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.08 vs. limit=22.5 2024-08-10 04:26:34,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=367980.0, ans=0.0 2024-08-10 04:26:34,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 3.081e+01 3.363e+01 3.893e+01 6.913e+01, threshold=6.726e+01, percent-clipped=1.0 2024-08-10 04:26:47,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-08-10 04:26:49,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.56 vs. limit=10.0 2024-08-10 04:26:53,365 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 04:26:53,614 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:27:05,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.79 vs. limit=22.5 2024-08-10 04:27:05,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=368180.0, ans=0.2 2024-08-10 04:27:08,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=368180.0, ans=0.025 2024-08-10 04:27:09,767 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 04:27:28,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7850, loss[loss=0.109, beats_loss=0.01083, ecapa_loss=0.0003137, whisper_loss=0.09498, over 14531.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01229, ecapa_loss=0.0002911, whisper_loss=0.09938, over 3843246.18 frames. ], batch size: 56, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:27:36,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=368380.0, ans=0.125 2024-08-10 04:27:38,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=368380.0, ans=0.0 2024-08-10 04:27:39,959 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 04:27:41,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=368480.0, ans=0.1 2024-08-10 04:28:09,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.41 vs. limit=15.0 2024-08-10 04:28:17,828 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 04:28:36,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=368780.0, ans=0.125 2024-08-10 04:28:38,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2024-08-10 04:28:38,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7900, loss[loss=0.1201, beats_loss=0.01412, ecapa_loss=0.0002467, whisper_loss=0.1035, over 20675.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01229, ecapa_loss=0.0002897, whisper_loss=0.1008, over 3831355.15 frames. ], batch size: 79, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:28:51,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=368980.0, ans=0.0 2024-08-10 04:28:54,643 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 2.975e+01 3.379e+01 4.027e+01 6.816e+01, threshold=6.758e+01, percent-clipped=1.0 2024-08-10 04:28:54,897 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 11 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 04:29:00,677 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 04:29:00,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=368980.0, ans=0.0 2024-08-10 04:29:02,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=368980.0, ans=0.1 2024-08-10 04:29:04,584 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 04:29:07,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=369080.0, ans=0.125 2024-08-10 04:29:11,678 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 04:29:17,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2024-08-10 04:29:19,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.06 vs. limit=10.0 2024-08-10 04:29:22,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369180.0, ans=0.1 2024-08-10 04:29:26,974 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 04:29:30,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=369180.0, ans=0.125 2024-08-10 04:29:36,798 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 04:29:37,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2024-08-10 04:29:43,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2024-08-10 04:29:47,892 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 7950, loss[loss=0.08558, beats_loss=0.01591, ecapa_loss=0.000238, whisper_loss=0.06729, over 18419.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01234, ecapa_loss=0.0002897, whisper_loss=0.09944, over 3842788.86 frames. ], batch size: 75, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:29:54,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369380.0, ans=0.125 2024-08-10 04:30:15,517 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 04:30:21,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=369580.0, ans=0.125 2024-08-10 04:30:22,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=369580.0, ans=0.0 2024-08-10 04:30:37,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369680.0, ans=0.125 2024-08-10 04:30:43,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=369780.0, ans=0.125 2024-08-10 04:30:48,574 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 04:30:51,383 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 04:30:56,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8000, loss[loss=0.1153, beats_loss=0.01285, ecapa_loss=0.000246, whisper_loss=0.09995, over 20160.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01226, ecapa_loss=0.0002906, whisper_loss=0.09999, over 3858146.92 frames. ], batch size: 79, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:31:03,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=369880.0, ans=0.1 2024-08-10 04:31:13,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 3.026e+01 3.341e+01 3.954e+01 6.055e+01, threshold=6.681e+01, percent-clipped=0.0 2024-08-10 04:31:38,470 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 04:31:39,973 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 04:31:57,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=370280.0, ans=0.2 2024-08-10 04:32:05,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8050, loss[loss=0.1045, beats_loss=0.01005, ecapa_loss=0.0002755, whisper_loss=0.09166, over 15389.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.0122, ecapa_loss=0.0002906, whisper_loss=0.1005, over 3852214.37 frames. ], batch size: 59, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:32:23,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=370480.0, ans=0.0 2024-08-10 04:32:32,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=370580.0, ans=0.2 2024-08-10 04:32:32,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=370580.0, ans=0.125 2024-08-10 04:32:35,436 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-10 04:32:41,027 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 04:33:14,962 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8100, loss[loss=0.1028, beats_loss=0.01414, ecapa_loss=0.0002536, whisper_loss=0.08617, over 21270.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01232, ecapa_loss=0.0002872, whisper_loss=0.09989, over 3888371.89 frames. ], batch size: 88, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:33:18,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=370880.0, ans=0.035 2024-08-10 04:33:21,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.04 vs. limit=22.5 2024-08-10 04:33:26,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=370880.0, ans=0.125 2024-08-10 04:33:31,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.921e+01 3.268e+01 3.818e+01 1.425e+02, threshold=6.536e+01, percent-clipped=1.0 2024-08-10 04:34:13,185 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 04:34:23,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8150, loss[loss=0.133, beats_loss=0.009408, ecapa_loss=0.0003274, whisper_loss=0.1203, over 23224.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01234, ecapa_loss=0.00029, whisper_loss=0.09976, over 3944545.48 frames. ], batch size: 94, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:34:26,904 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 04:34:29,617 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 04:34:44,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371480.0, ans=0.1 2024-08-10 04:35:31,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8200, loss[loss=0.1407, beats_loss=0.009023, ecapa_loss=0.0002588, whisper_loss=0.1291, over 20754.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01233, ecapa_loss=0.00029, whisper_loss=0.09972, over 3947954.76 frames. ], batch size: 77, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:35:38,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.99 vs. limit=10.0 2024-08-10 04:35:39,255 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 04:35:48,832 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 2.993e+01 3.348e+01 3.834e+01 8.342e+01, threshold=6.697e+01, percent-clipped=3.0 2024-08-10 04:35:54,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2024-08-10 04:35:55,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-08-10 04:35:58,876 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 17 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-10 04:36:00,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=372080.0, ans=0.0 2024-08-10 04:36:16,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=372180.0, ans=0.125 2024-08-10 04:36:30,915 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-10 04:36:32,887 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.621e-01 2024-08-10 04:36:42,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8250, loss[loss=0.1274, beats_loss=0.01149, ecapa_loss=0.000279, whisper_loss=0.1131, over 19923.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01234, ecapa_loss=0.0002891, whisper_loss=0.09976, over 3952142.33 frames. ], batch size: 75, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:36:53,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=372380.0, ans=0.1 2024-08-10 04:37:10,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=372580.0, ans=0.0 2024-08-10 04:37:10,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2024-08-10 04:37:15,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.09 vs. limit=15.0 2024-08-10 04:37:28,614 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 17 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 04:37:54,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8300, loss[loss=0.1056, beats_loss=0.01366, ecapa_loss=0.0002739, whisper_loss=0.08922, over 20415.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01237, ecapa_loss=0.000289, whisper_loss=0.0994, over 3931290.20 frames. ], batch size: 82, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:37:57,742 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 04:38:12,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+01 3.111e+01 3.544e+01 4.051e+01 1.362e+02, threshold=7.088e+01, percent-clipped=2.0 2024-08-10 04:38:15,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=372980.0, ans=0.0 2024-08-10 04:38:18,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=372980.0, ans=0.04949747468305833 2024-08-10 04:38:19,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-08-10 04:38:20,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-10 04:38:38,565 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 04:38:44,673 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 04:38:58,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=373280.0, ans=0.05 2024-08-10 04:39:07,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8350, loss[loss=0.1229, beats_loss=0.01337, ecapa_loss=0.0002993, whisper_loss=0.1066, over 21983.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01237, ecapa_loss=0.0002885, whisper_loss=0.09899, over 3934160.47 frames. ], batch size: 90, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:39:09,257 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 04:39:10,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=373380.0, ans=0.125 2024-08-10 04:39:33,130 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.837e-02 2024-08-10 04:39:58,177 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 04:40:04,737 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-10 04:40:04,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=373680.0, ans=0.0 2024-08-10 04:40:11,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=373780.0, ans=0.125 2024-08-10 04:40:26,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8400, loss[loss=0.1264, beats_loss=0.01038, ecapa_loss=0.0002974, whisper_loss=0.113, over 17369.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.0124, ecapa_loss=0.0002892, whisper_loss=0.09912, over 3940015.44 frames. ], batch size: 67, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:40:29,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2024-08-10 04:40:48,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.937e+01 3.360e+01 3.795e+01 5.469e+01, threshold=6.721e+01, percent-clipped=0.0 2024-08-10 04:41:03,270 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 04:41:11,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.019e-02 2024-08-10 04:41:18,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=374080.0, ans=0.125 2024-08-10 04:41:32,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=374180.0, ans=0.0 2024-08-10 04:41:38,946 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 25 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-10 04:41:57,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8450, loss[loss=0.1428, beats_loss=0.006803, ecapa_loss=0.0003259, whisper_loss=0.1328, over 20395.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01232, ecapa_loss=0.0002905, whisper_loss=0.09912, over 3935600.05 frames. ], batch size: 77, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:42:26,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374480.0, ans=0.125 2024-08-10 04:42:28,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374480.0, ans=0.1 2024-08-10 04:42:32,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=374480.0, ans=0.0 2024-08-10 04:42:36,502 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 04:42:49,829 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 04:43:04,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-10 04:43:06,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374680.0, ans=0.125 2024-08-10 04:43:10,737 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 04:43:13,888 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 04:43:15,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=374780.0, ans=0.125 2024-08-10 04:43:17,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2024-08-10 04:43:19,394 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 04:43:19,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374780.0, ans=0.1 2024-08-10 04:43:27,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8500, loss[loss=0.1198, beats_loss=0.009752, ecapa_loss=0.0002908, whisper_loss=0.1071, over 15272.00 frames. ], tot_loss[loss=0.114, beats_loss=0.0123, ecapa_loss=0.000289, whisper_loss=0.09883, over 3910359.32 frames. ], batch size: 60, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:43:43,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=374880.0, ans=0.125 2024-08-10 04:43:48,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.067e+01 3.351e+01 3.844e+01 5.655e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 04:44:19,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.57 vs. limit=10.0 2024-08-10 04:44:51,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.97 vs. limit=22.5 2024-08-10 04:44:54,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=15.0 2024-08-10 04:44:54,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8550, loss[loss=0.1476, beats_loss=0.01077, ecapa_loss=0.0002908, whisper_loss=0.1339, over 23431.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01232, ecapa_loss=0.0002886, whisper_loss=0.09893, over 3898651.92 frames. ], batch size: 88, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:45:27,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=375480.0, ans=0.125 2024-08-10 04:45:43,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2024-08-10 04:45:45,352 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 04:46:11,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8600, loss[loss=0.1546, beats_loss=0.006459, ecapa_loss=0.0003244, whisper_loss=0.1449, over 14548.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01229, ecapa_loss=0.0002892, whisper_loss=0.09867, over 3883537.97 frames. ], batch size: 55, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:46:11,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=375880.0, ans=0.0 2024-08-10 04:46:19,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375880.0, ans=0.1 2024-08-10 04:46:27,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.077e+01 3.509e+01 3.969e+01 6.307e+01, threshold=7.019e+01, percent-clipped=0.0 2024-08-10 04:46:45,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=376080.0, ans=0.2 2024-08-10 04:46:45,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376080.0, ans=0.1 2024-08-10 04:46:52,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=376180.0, ans=0.0 2024-08-10 04:46:59,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=376180.0, ans=0.0 2024-08-10 04:47:21,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8650, loss[loss=0.1095, beats_loss=0.01101, ecapa_loss=0.0003394, whisper_loss=0.09507, over 16891.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01235, ecapa_loss=0.0002883, whisper_loss=0.09834, over 3875940.95 frames. ], batch size: 70, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:47:21,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=376380.0, ans=0.125 2024-08-10 04:47:27,096 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-10 04:47:32,875 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 04:47:37,132 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 04:47:40,117 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 04:47:46,727 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 04:47:55,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=376580.0, ans=0.125 2024-08-10 04:47:56,819 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 04:47:59,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=376580.0, ans=0.95 2024-08-10 04:48:27,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=376780.0, ans=0.125 2024-08-10 04:48:31,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8700, loss[loss=0.08474, beats_loss=0.01154, ecapa_loss=0.0002918, whisper_loss=0.07028, over 14938.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01239, ecapa_loss=0.0002881, whisper_loss=0.09791, over 3866231.40 frames. ], batch size: 62, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:48:37,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2024-08-10 04:48:42,985 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 04:48:44,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2024-08-10 04:48:47,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.537e+01 3.015e+01 3.371e+01 3.912e+01 6.380e+01, threshold=6.741e+01, percent-clipped=0.0 2024-08-10 04:48:55,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2024-08-10 04:48:58,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-10 04:48:59,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=377080.0, ans=0.125 2024-08-10 04:49:04,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2024-08-10 04:49:09,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-10 04:49:12,647 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 04:49:13,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2024-08-10 04:49:36,149 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 04:49:39,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8750, loss[loss=0.1137, beats_loss=0.01136, ecapa_loss=0.0002736, whisper_loss=0.09961, over 21965.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01234, ecapa_loss=0.0002887, whisper_loss=0.09771, over 3830889.72 frames. ], batch size: 86, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:49:43,984 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 04:50:05,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=377580.0, ans=0.05 2024-08-10 04:50:14,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=377580.0, ans=0.025 2024-08-10 04:50:14,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=15.0 2024-08-10 04:50:25,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=377680.0, ans=0.04949747468305833 2024-08-10 04:50:28,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2024-08-10 04:50:42,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=377780.0, ans=0.07 2024-08-10 04:50:45,544 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 04:50:47,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8800, loss[loss=0.09582, beats_loss=0.01509, ecapa_loss=0.0002492, whisper_loss=0.07823, over 22796.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01238, ecapa_loss=0.0002887, whisper_loss=0.09785, over 3846596.07 frames. ], batch size: 93, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:50:49,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=377880.0, ans=0.1 2024-08-10 04:50:54,859 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 04:51:04,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.131e+01 3.473e+01 4.096e+01 6.875e+01, threshold=6.946e+01, percent-clipped=1.0 2024-08-10 04:51:11,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=377980.0, ans=0.2 2024-08-10 04:51:20,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.31 vs. limit=22.5 2024-08-10 04:51:36,563 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 27 from Vox, 18 fro AS 2024-08-10 04:51:55,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2024-08-10 04:51:57,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8850, loss[loss=0.1271, beats_loss=0.00733, ecapa_loss=0.0004128, whisper_loss=0.1157, over 17188.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01236, ecapa_loss=0.0002892, whisper_loss=0.09821, over 3813382.86 frames. ], batch size: 71, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:52:03,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=378380.0, ans=0.0 2024-08-10 04:52:23,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378480.0, ans=0.1 2024-08-10 04:52:25,729 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 04:52:31,053 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 04:52:35,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=378580.0, ans=0.125 2024-08-10 04:52:35,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=378580.0, ans=0.125 2024-08-10 04:53:03,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=378780.0, ans=0.04949747468305833 2024-08-10 04:53:04,687 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 04:53:05,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8900, loss[loss=0.1219, beats_loss=0.01142, ecapa_loss=0.0003062, whisper_loss=0.1074, over 21465.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01256, ecapa_loss=0.0002863, whisper_loss=0.09753, over 3834935.11 frames. ], batch size: 86, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:53:10,044 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 04:53:22,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+01 3.017e+01 3.379e+01 3.848e+01 7.752e+01, threshold=6.759e+01, percent-clipped=1.0 2024-08-10 04:53:58,220 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 32 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 04:53:58,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=379180.0, ans=0.2 2024-08-10 04:54:12,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=22.5 2024-08-10 04:54:14,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 8950, loss[loss=0.1082, beats_loss=0.01352, ecapa_loss=0.0003109, whisper_loss=0.09157, over 20969.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01249, ecapa_loss=0.0002879, whisper_loss=0.09725, over 3839163.44 frames. ], batch size: 86, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:54:16,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=379380.0, ans=0.125 2024-08-10 04:54:16,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=379380.0, ans=0.125 2024-08-10 04:54:24,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=379380.0, ans=0.125 2024-08-10 04:54:26,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=379480.0, ans=0.125 2024-08-10 04:55:22,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9000, loss[loss=0.1383, beats_loss=0.01042, ecapa_loss=0.0003143, whisper_loss=0.1248, over 19411.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.0124, ecapa_loss=0.0002881, whisper_loss=0.09839, over 3871274.35 frames. ], batch size: 75, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:55:22,695 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 04:56:01,337 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on ASR_libri: loss=0.2773, beats_loss=0, ecapa_loss=0.0008691, whisper_loss=0.2686, over 922467.00 frames. 2024-08-10 04:56:19,225 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on SV_voxceleb1: loss=0.007577, beats_loss=0, ecapa_loss=0.0007577, whisper_loss=0, over 939242.00 frames. 2024-08-10 04:58:16,688 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on AT_audioset: loss=0.02874, beats_loss=0.02874, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 04:58:16,692 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 04:58:20,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=379880.0, ans=0.1 2024-08-10 04:58:29,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=379980.0, ans=0.125 2024-08-10 04:58:30,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=379980.0, ans=0.0 2024-08-10 04:58:30,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379980.0, ans=0.1 2024-08-10 04:58:33,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 3.022e+01 3.372e+01 4.052e+01 6.376e+01, threshold=6.745e+01, percent-clipped=0.0 2024-08-10 04:58:33,294 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 04:58:55,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=380080.0, ans=0.015 2024-08-10 04:58:55,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=380080.0, ans=0.2 2024-08-10 04:59:01,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380180.0, ans=0.1 2024-08-10 04:59:09,678 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 04:59:25,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=380380.0, ans=0.125 2024-08-10 04:59:25,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9050, loss[loss=0.09087, beats_loss=0.01538, ecapa_loss=0.0002547, whisper_loss=0.07294, over 18926.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01244, ecapa_loss=0.0002889, whisper_loss=0.09829, over 3886283.46 frames. ], batch size: 75, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 04:59:40,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=380480.0, ans=0.0 2024-08-10 04:59:41,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=380480.0, ans=0.2 2024-08-10 04:59:58,974 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 05:00:14,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380680.0, ans=0.1 2024-08-10 05:00:18,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2024-08-10 05:00:21,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=380780.0, ans=10.0 2024-08-10 05:00:34,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9100, loss[loss=0.09579, beats_loss=0.01274, ecapa_loss=0.0002883, whisper_loss=0.08016, over 18858.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01236, ecapa_loss=0.0002895, whisper_loss=0.09887, over 3911480.46 frames. ], batch size: 76, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:00:41,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-10 05:00:51,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.817e+01 3.235e+01 3.647e+01 7.816e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 05:00:55,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=380980.0, ans=0.0 2024-08-10 05:01:05,034 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 05:01:13,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=381080.0, ans=0.0 2024-08-10 05:01:21,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=381180.0, ans=0.0 2024-08-10 05:01:24,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=381180.0, ans=0.1 2024-08-10 05:01:35,726 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 05:01:36,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.75 vs. limit=22.5 2024-08-10 05:01:43,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9150, loss[loss=0.149, beats_loss=0.008323, ecapa_loss=0.0002706, whisper_loss=0.138, over 23974.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01229, ecapa_loss=0.0002877, whisper_loss=0.09909, over 3917030.02 frames. ], batch size: 91, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:01:46,356 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 05:01:46,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=381380.0, ans=10.0 2024-08-10 05:02:07,262 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 05:02:08,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=381480.0, ans=0.0 2024-08-10 05:02:22,572 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 05:02:30,731 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 05:02:37,773 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 05:02:39,121 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 05:02:52,638 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9200, loss[loss=0.1051, beats_loss=0.01226, ecapa_loss=0.0003517, whisper_loss=0.08927, over 21391.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01229, ecapa_loss=0.0002905, whisper_loss=0.0993, over 3908731.04 frames. ], batch size: 91, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:02:55,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=381880.0, ans=0.0 2024-08-10 05:03:08,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 3.038e+01 3.317e+01 3.849e+01 8.293e+01, threshold=6.633e+01, percent-clipped=1.0 2024-08-10 05:03:13,249 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-10 05:03:33,725 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 05:03:49,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2024-08-10 05:03:58,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=382280.0, ans=0.0 2024-08-10 05:04:00,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9250, loss[loss=0.1092, beats_loss=0.01257, ecapa_loss=0.0002741, whisper_loss=0.09385, over 17053.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01237, ecapa_loss=0.0002924, whisper_loss=0.09886, over 3909114.48 frames. ], batch size: 66, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:04:02,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382380.0, ans=0.1 2024-08-10 05:04:05,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=382380.0, ans=0.0 2024-08-10 05:04:16,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=382480.0, ans=0.125 2024-08-10 05:04:36,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=382580.0, ans=15.0 2024-08-10 05:04:52,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=382680.0, ans=0.125 2024-08-10 05:04:57,250 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 05:05:09,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9300, loss[loss=0.1145, beats_loss=0.01325, ecapa_loss=0.000245, whisper_loss=0.09878, over 18702.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01232, ecapa_loss=0.0002902, whisper_loss=0.09897, over 3900550.48 frames. ], batch size: 71, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:05:15,472 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 05:05:26,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.054e+01 3.480e+01 4.164e+01 1.138e+02, threshold=6.960e+01, percent-clipped=2.0 2024-08-10 05:05:35,945 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 05:05:43,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.99 vs. limit=22.5 2024-08-10 05:05:45,960 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 05:06:15,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=383280.0, ans=0.02 2024-08-10 05:06:16,373 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 05:06:16,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=383280.0, ans=0.0 2024-08-10 05:06:18,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9350, loss[loss=0.1283, beats_loss=0.01042, ecapa_loss=0.0003659, whisper_loss=0.1142, over 22498.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01229, ecapa_loss=0.0002922, whisper_loss=0.09931, over 3893044.34 frames. ], batch size: 93, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:06:26,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=383380.0, ans=0.0 2024-08-10 05:06:43,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=383480.0, ans=0.09899494936611666 2024-08-10 05:06:48,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=383580.0, ans=0.125 2024-08-10 05:06:56,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=383580.0, ans=0.125 2024-08-10 05:07:01,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383680.0, ans=0.125 2024-08-10 05:07:03,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=383680.0, ans=0.05 2024-08-10 05:07:15,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383780.0, ans=0.1 2024-08-10 05:07:20,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=383780.0, ans=10.0 2024-08-10 05:07:21,022 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 05:07:21,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=383780.0, ans=0.125 2024-08-10 05:07:29,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9400, loss[loss=0.08701, beats_loss=0.01453, ecapa_loss=0.0003312, whisper_loss=0.06916, over 20057.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01238, ecapa_loss=0.0002911, whisper_loss=0.09811, over 3860576.51 frames. ], batch size: 86, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:07:32,375 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 05:07:34,985 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 05:07:40,532 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 05:07:45,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+01 2.835e+01 3.411e+01 3.975e+01 7.515e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-10 05:07:48,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 05:07:56,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=384080.0, ans=0.125 2024-08-10 05:08:04,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=384080.0, ans=0.125 2024-08-10 05:08:05,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=384080.0, ans=0.2 2024-08-10 05:08:37,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9450, loss[loss=0.1233, beats_loss=0.01256, ecapa_loss=0.0003399, whisper_loss=0.1074, over 21289.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01244, ecapa_loss=0.0002921, whisper_loss=0.09899, over 3889952.63 frames. ], batch size: 88, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:08:51,482 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 05:08:55,520 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 05:09:10,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=384580.0, ans=0.035 2024-08-10 05:09:12,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384580.0, ans=0.1 2024-08-10 05:09:45,405 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 05:09:46,513 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9500, loss[loss=0.113, beats_loss=0.01139, ecapa_loss=0.0003088, whisper_loss=0.09852, over 15572.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01248, ecapa_loss=0.0002924, whisper_loss=0.09864, over 3895679.67 frames. ], batch size: 62, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:09:49,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=384880.0, ans=0.125 2024-08-10 05:09:50,992 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 05:09:54,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=384880.0, ans=0.0 2024-08-10 05:09:57,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=384880.0, ans=0.07 2024-08-10 05:10:00,930 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 05:10:01,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384980.0, ans=0.1 2024-08-10 05:10:03,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.035e+01 3.445e+01 3.941e+01 9.468e+01, threshold=6.890e+01, percent-clipped=2.0 2024-08-10 05:10:17,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=385080.0, ans=0.0 2024-08-10 05:10:23,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=385080.0, ans=0.125 2024-08-10 05:10:38,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=385180.0, ans=0.2 2024-08-10 05:10:45,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2024-08-10 05:10:53,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.17 vs. limit=22.5 2024-08-10 05:10:55,331 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9550, loss[loss=0.1227, beats_loss=0.01179, ecapa_loss=0.0003441, whisper_loss=0.1074, over 21980.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.0124, ecapa_loss=0.000292, whisper_loss=0.0983, over 3884144.97 frames. ], batch size: 91, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:11:00,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=385380.0, ans=0.0 2024-08-10 05:11:06,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=385380.0, ans=0.07 2024-08-10 05:11:18,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385480.0, ans=0.1 2024-08-10 05:11:28,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=385580.0, ans=0.125 2024-08-10 05:11:32,194 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 05:11:33,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=385580.0, ans=0.2 2024-08-10 05:11:57,769 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 05:12:04,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9600, loss[loss=0.113, beats_loss=0.01218, ecapa_loss=0.000248, whisper_loss=0.0983, over 19802.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01236, ecapa_loss=0.0002926, whisper_loss=0.09829, over 3848664.88 frames. ], batch size: 76, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:12:21,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 3.010e+01 3.489e+01 4.021e+01 7.106e+01, threshold=6.979e+01, percent-clipped=1.0 2024-08-10 05:12:22,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-08-10 05:12:34,628 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 05:12:35,922 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 05:12:38,600 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 05:13:11,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=386280.0, ans=0.0 2024-08-10 05:13:14,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9650, loss[loss=0.1048, beats_loss=0.01345, ecapa_loss=0.0002514, whisper_loss=0.08879, over 15214.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01239, ecapa_loss=0.0002907, whisper_loss=0.09791, over 3877944.45 frames. ], batch size: 59, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:13:24,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=386380.0, ans=0.125 2024-08-10 05:13:30,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=386480.0, ans=0.05 2024-08-10 05:13:32,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386480.0, ans=0.1 2024-08-10 05:13:32,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=15.0 2024-08-10 05:13:47,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=386580.0, ans=0.0 2024-08-10 05:13:58,205 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 05:14:01,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=386680.0, ans=0.125 2024-08-10 05:14:14,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=12.0 2024-08-10 05:14:24,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9700, loss[loss=0.1167, beats_loss=0.0135, ecapa_loss=0.0002782, whisper_loss=0.1004, over 18665.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01235, ecapa_loss=0.0002925, whisper_loss=0.09861, over 3874652.98 frames. ], batch size: 75, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:14:24,854 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 05:14:27,491 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 05:14:35,534 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-10 05:14:35,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=386880.0, ans=0.125 2024-08-10 05:14:37,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=386980.0, ans=0.125 2024-08-10 05:14:40,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.854e+01 3.317e+01 3.898e+01 6.731e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 05:14:41,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-08-10 05:14:46,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=386980.0, ans=0.125 2024-08-10 05:14:48,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2024-08-10 05:15:33,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9750, loss[loss=0.1041, beats_loss=0.01429, ecapa_loss=0.0003181, whisper_loss=0.08661, over 18853.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01226, ecapa_loss=0.0002914, whisper_loss=0.09901, over 3836202.21 frames. ], batch size: 79, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:15:55,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=387480.0, ans=0.125 2024-08-10 05:15:55,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.42 vs. limit=22.5 2024-08-10 05:16:02,125 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 05:16:05,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-10 05:16:26,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.93 vs. limit=22.5 2024-08-10 05:16:31,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=387780.0, ans=0.125 2024-08-10 05:16:33,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=387780.0, ans=0.04949747468305833 2024-08-10 05:16:43,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9800, loss[loss=0.1012, beats_loss=0.01326, ecapa_loss=0.0003101, whisper_loss=0.08484, over 18319.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01227, ecapa_loss=0.0002899, whisper_loss=0.09884, over 3822782.81 frames. ], batch size: 75, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:16:59,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.835e+01 3.207e+01 3.802e+01 6.736e+01, threshold=6.414e+01, percent-clipped=1.0 2024-08-10 05:17:08,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387980.0, ans=0.1 2024-08-10 05:17:16,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=388080.0, ans=0.0 2024-08-10 05:17:22,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388080.0, ans=0.1 2024-08-10 05:17:30,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.68 vs. limit=10.0 2024-08-10 05:17:33,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=388180.0, ans=0.07 2024-08-10 05:17:51,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9850, loss[loss=0.1124, beats_loss=0.01358, ecapa_loss=0.000252, whisper_loss=0.09633, over 20800.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01239, ecapa_loss=0.000287, whisper_loss=0.09837, over 3845143.65 frames. ], batch size: 83, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:18:02,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=388380.0, ans=0.0 2024-08-10 05:18:06,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=388480.0, ans=0.125 2024-08-10 05:18:11,507 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 05:18:12,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=388480.0, ans=0.125 2024-08-10 05:18:39,775 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-10 05:18:44,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=12.0 2024-08-10 05:18:49,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=12.0 2024-08-10 05:19:00,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9900, loss[loss=0.1206, beats_loss=0.01217, ecapa_loss=0.0002426, whisper_loss=0.106, over 20653.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01236, ecapa_loss=0.0002894, whisper_loss=0.09894, over 3880543.73 frames. ], batch size: 78, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:19:02,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2024-08-10 05:19:17,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.911e+01 3.357e+01 3.805e+01 2.149e+02, threshold=6.715e+01, percent-clipped=2.0 2024-08-10 05:19:17,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=388980.0, ans=0.0 2024-08-10 05:19:19,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=388980.0, ans=0.1 2024-08-10 05:19:23,149 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 05:19:26,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388980.0, ans=0.1 2024-08-10 05:19:37,108 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 05:19:39,771 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 05:19:40,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=389080.0, ans=0.0 2024-08-10 05:19:40,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=389080.0, ans=0.05 2024-08-10 05:19:48,262 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 05:19:59,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=389280.0, ans=0.125 2024-08-10 05:20:06,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=389280.0, ans=0.1 2024-08-10 05:20:09,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=389380.0, ans=0.125 2024-08-10 05:20:10,180 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 9950, loss[loss=0.1309, beats_loss=0.0108, ecapa_loss=0.0003055, whisper_loss=0.1171, over 23642.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01248, ecapa_loss=0.000288, whisper_loss=0.09774, over 3857743.78 frames. ], batch size: 91, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:20:21,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=389380.0, ans=10.0 2024-08-10 05:20:22,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=389480.0, ans=0.0 2024-08-10 05:20:28,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=389480.0, ans=0.125 2024-08-10 05:20:38,676 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 05:20:46,533 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 05:20:48,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=389580.0, ans=0.125 2024-08-10 05:21:16,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=389880.0, ans=0.2 2024-08-10 05:21:17,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10000, loss[loss=0.1091, beats_loss=0.01215, ecapa_loss=0.0003101, whisper_loss=0.09382, over 17572.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01246, ecapa_loss=0.0002933, whisper_loss=0.09852, over 3882776.74 frames. ], batch size: 74, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:21:23,441 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 05:21:34,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.053e+01 3.527e+01 4.199e+01 1.415e+02, threshold=7.054e+01, percent-clipped=3.0 2024-08-10 05:21:35,281 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 05:21:39,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=389980.0, ans=0.125 2024-08-10 05:21:40,773 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 05:21:46,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=390080.0, ans=0.0 2024-08-10 05:21:54,632 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 05:22:01,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=390180.0, ans=0.2 2024-08-10 05:22:05,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=390180.0, ans=0.1 2024-08-10 05:22:09,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=390180.0, ans=0.125 2024-08-10 05:22:21,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=390280.0, ans=0.05 2024-08-10 05:22:22,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=390280.0, ans=0.0 2024-08-10 05:22:23,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=390280.0, ans=0.125 2024-08-10 05:22:27,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10050, loss[loss=0.1178, beats_loss=0.00958, ecapa_loss=0.0002494, whisper_loss=0.1058, over 15748.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01249, ecapa_loss=0.0002918, whisper_loss=0.09791, over 3884032.10 frames. ], batch size: 57, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:22:31,735 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 05:22:46,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=390480.0, ans=0.125 2024-08-10 05:23:04,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2024-08-10 05:23:15,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=390680.0, ans=0.125 2024-08-10 05:23:35,208 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10100, loss[loss=0.1154, beats_loss=0.009874, ecapa_loss=0.0003015, whisper_loss=0.1025, over 13603.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01244, ecapa_loss=0.0002928, whisper_loss=0.09846, over 3903751.09 frames. ], batch size: 53, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:23:44,124 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 05:23:46,518 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 05:23:51,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.903e+01 3.270e+01 3.742e+01 9.283e+01, threshold=6.541e+01, percent-clipped=1.0 2024-08-10 05:23:58,124 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 05:24:01,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=391080.0, ans=0.2 2024-08-10 05:24:29,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=391280.0, ans=0.0 2024-08-10 05:24:35,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=391280.0, ans=0.07 2024-08-10 05:24:44,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10150, loss[loss=0.1421, beats_loss=0.01031, ecapa_loss=0.00023, whisper_loss=0.1295, over 18845.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01236, ecapa_loss=0.0002931, whisper_loss=0.09882, over 3919308.90 frames. ], batch size: 67, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:24:47,950 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 05:25:10,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-10 05:25:18,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=391580.0, ans=0.0 2024-08-10 05:25:19,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=391580.0, ans=0.125 2024-08-10 05:25:27,788 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 21 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-10 05:25:31,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=15.0 2024-08-10 05:25:36,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=391680.0, ans=0.0 2024-08-10 05:25:47,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=391780.0, ans=0.125 2024-08-10 05:25:54,880 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.123e+00 2024-08-10 05:25:57,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10200, loss[loss=0.1054, beats_loss=0.01407, ecapa_loss=0.0002811, whisper_loss=0.08851, over 18689.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01241, ecapa_loss=0.0002936, whisper_loss=0.09792, over 3874322.97 frames. ], batch size: 75, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:26:01,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-10 05:26:11,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=391980.0, ans=0.0 2024-08-10 05:26:14,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.916e+01 3.286e+01 3.891e+01 7.167e+01, threshold=6.572e+01, percent-clipped=1.0 2024-08-10 05:26:14,666 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 05:26:43,681 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 05:27:12,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10250, loss[loss=0.1222, beats_loss=0.01126, ecapa_loss=0.0003313, whisper_loss=0.1077, over 22450.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.0125, ecapa_loss=0.0002916, whisper_loss=0.09813, over 3902710.37 frames. ], batch size: 92, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:27:18,231 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 05:27:50,568 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 05:27:52,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=392580.0, ans=0.0 2024-08-10 05:27:57,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-10 05:28:26,552 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 05:28:27,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10300, loss[loss=0.1413, beats_loss=0.01171, ecapa_loss=0.0002819, whisper_loss=0.1268, over 19952.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01244, ecapa_loss=0.0002898, whisper_loss=0.0986, over 3908865.47 frames. ], batch size: 77, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:28:46,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.063e+01 3.413e+01 3.835e+01 1.358e+02, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 05:28:54,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-08-10 05:29:05,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=12.0 2024-08-10 05:29:35,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=393280.0, ans=0.125 2024-08-10 05:29:38,946 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.849e+01 2024-08-10 05:29:45,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10350, loss[loss=0.14, beats_loss=0.01283, ecapa_loss=0.0002454, whisper_loss=0.1247, over 23716.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01247, ecapa_loss=0.0002891, whisper_loss=0.09857, over 3909987.11 frames. ], batch size: 92, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:29:50,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-08-10 05:29:56,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=393380.0, ans=0.0 2024-08-10 05:30:04,182 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 05:30:04,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2024-08-10 05:30:28,608 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 05:30:39,063 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 05:30:55,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=393780.0, ans=0.125 2024-08-10 05:31:03,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10400, loss[loss=0.1165, beats_loss=0.01219, ecapa_loss=0.0003024, whisper_loss=0.1013, over 18935.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01231, ecapa_loss=0.0002918, whisper_loss=0.09951, over 3921481.57 frames. ], batch size: 75, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:31:09,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=393880.0, ans=0.125 2024-08-10 05:31:12,250 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 05:31:13,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=393880.0, ans=0.125 2024-08-10 05:31:17,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393980.0, ans=0.1 2024-08-10 05:31:21,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.940e+01 3.359e+01 3.808e+01 2.361e+02, threshold=6.718e+01, percent-clipped=2.0 2024-08-10 05:31:27,968 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 05:31:28,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.11 vs. limit=22.5 2024-08-10 05:31:28,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=393980.0, ans=15.0 2024-08-10 05:31:29,704 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 05:31:33,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=394080.0, ans=0.125 2024-08-10 05:32:16,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10450, loss[loss=0.09727, beats_loss=0.01167, ecapa_loss=0.0003149, whisper_loss=0.08246, over 14596.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01238, ecapa_loss=0.0002897, whisper_loss=0.09861, over 3909794.78 frames. ], batch size: 59, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:32:29,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=394480.0, ans=0.125 2024-08-10 05:32:41,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=394480.0, ans=0.07 2024-08-10 05:32:42,651 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 05:33:02,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394680.0, ans=0.1 2024-08-10 05:33:05,725 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 05:33:23,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=394780.0, ans=0.125 2024-08-10 05:33:28,509 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 05:33:31,283 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10500, loss[loss=0.1159, beats_loss=0.01467, ecapa_loss=0.0002045, whisper_loss=0.09916, over 25100.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01239, ecapa_loss=0.00029, whisper_loss=0.09902, over 3880397.72 frames. ], batch size: 95, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:33:35,945 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 05:33:45,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=394980.0, ans=0.0 2024-08-10 05:33:49,543 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.533e+01 2.971e+01 3.381e+01 3.721e+01 5.999e+01, threshold=6.761e+01, percent-clipped=0.0 2024-08-10 05:33:49,836 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-10 05:34:09,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-10 05:34:29,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.39 vs. limit=10.0 2024-08-10 05:34:36,971 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 05:34:44,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395280.0, ans=0.1 2024-08-10 05:34:46,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10550, loss[loss=0.1151, beats_loss=0.01223, ecapa_loss=0.0003459, whisper_loss=0.09938, over 16358.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01243, ecapa_loss=0.0002904, whisper_loss=0.09856, over 3880006.85 frames. ], batch size: 69, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:34:47,595 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.580e-02 2024-08-10 05:34:52,685 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 05:34:57,290 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 05:35:11,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=395480.0, ans=0.125 2024-08-10 05:35:20,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395580.0, ans=0.125 2024-08-10 05:35:33,677 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 05:35:33,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=395680.0, ans=0.0 2024-08-10 05:36:02,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10600, loss[loss=0.1101, beats_loss=0.01067, ecapa_loss=0.0002863, whisper_loss=0.09657, over 17023.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01244, ecapa_loss=0.0002906, whisper_loss=0.09823, over 3880542.72 frames. ], batch size: 64, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:36:03,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.72 vs. limit=10.0 2024-08-10 05:36:04,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=395880.0, ans=0.2 2024-08-10 05:36:12,481 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 05:36:18,593 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 05:36:19,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.978e+01 3.470e+01 3.932e+01 9.831e+01, threshold=6.940e+01, percent-clipped=1.0 2024-08-10 05:36:24,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-10 05:36:33,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-08-10 05:36:37,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=396080.0, ans=0.09899494936611666 2024-08-10 05:36:40,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=396080.0, ans=0.125 2024-08-10 05:36:40,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=396080.0, ans=0.0 2024-08-10 05:37:14,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-10 05:37:17,804 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10650, loss[loss=0.113, beats_loss=0.01329, ecapa_loss=0.0002587, whisper_loss=0.09717, over 23245.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01245, ecapa_loss=0.000288, whisper_loss=0.09822, over 3881460.57 frames. ], batch size: 93, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:37:27,585 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 05:37:28,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2024-08-10 05:38:02,676 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 05:38:17,926 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 05:38:24,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396780.0, ans=0.1 2024-08-10 05:38:28,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=396780.0, ans=0.0 2024-08-10 05:38:32,085 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10700, loss[loss=0.1365, beats_loss=0.01125, ecapa_loss=0.0003166, whisper_loss=0.1221, over 22184.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01233, ecapa_loss=0.0002856, whisper_loss=0.09922, over 3888811.52 frames. ], batch size: 90, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:38:49,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.168e+01 3.517e+01 4.154e+01 8.442e+01, threshold=7.034e+01, percent-clipped=1.0 2024-08-10 05:38:52,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396980.0, ans=0.1 2024-08-10 05:39:07,598 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 05:39:25,147 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 05:39:37,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=397280.0, ans=0.95 2024-08-10 05:39:45,374 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 05:39:46,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=397280.0, ans=22.5 2024-08-10 05:39:47,672 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10750, loss[loss=0.106, beats_loss=0.01204, ecapa_loss=0.0002794, whisper_loss=0.09112, over 16752.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01228, ecapa_loss=0.0002844, whisper_loss=0.09965, over 3895284.27 frames. ], batch size: 70, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:40:01,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=397480.0, ans=0.125 2024-08-10 05:40:47,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=397780.0, ans=0.0 2024-08-10 05:41:02,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10800, loss[loss=0.1372, beats_loss=0.01043, ecapa_loss=0.0002689, whisper_loss=0.1241, over 22439.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.0122, ecapa_loss=0.0002847, whisper_loss=0.1003, over 3879708.92 frames. ], batch size: 87, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:41:20,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+01 2.911e+01 3.259e+01 3.950e+01 6.115e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-10 05:41:28,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=397980.0, ans=0.0 2024-08-10 05:41:30,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397980.0, ans=0.1 2024-08-10 05:41:37,720 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 05:42:03,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=12.0 2024-08-10 05:42:18,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10850, loss[loss=0.1059, beats_loss=0.013, ecapa_loss=0.0003399, whisper_loss=0.08952, over 19139.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01219, ecapa_loss=0.0002858, whisper_loss=0.1008, over 3915099.97 frames. ], batch size: 81, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:42:23,554 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 05:42:24,892 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 05:42:34,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=398480.0, ans=0.125 2024-08-10 05:42:40,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=15.0 2024-08-10 05:43:04,804 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 05:43:24,904 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 05:43:28,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-10 05:43:35,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10900, loss[loss=0.0789, beats_loss=0.01405, ecapa_loss=0.0003089, whisper_loss=0.06176, over 12804.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01214, ecapa_loss=0.0002882, whisper_loss=0.1004, over 3919490.12 frames. ], batch size: 53, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:43:53,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.145e+01 3.517e+01 3.996e+01 1.577e+02, threshold=7.034e+01, percent-clipped=2.0 2024-08-10 05:44:01,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=398980.0, ans=0.125 2024-08-10 05:44:03,025 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 05:44:06,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=399080.0, ans=0.0 2024-08-10 05:44:10,645 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 05:44:10,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=399080.0, ans=0.2 2024-08-10 05:44:17,182 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 05:44:24,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=399180.0, ans=0.025 2024-08-10 05:44:37,342 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 05:44:39,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=399280.0, ans=0.0 2024-08-10 05:44:43,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2024-08-10 05:44:49,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 10950, loss[loss=0.1267, beats_loss=0.0125, ecapa_loss=0.0002744, whisper_loss=0.1115, over 21910.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01214, ecapa_loss=0.0002881, whisper_loss=0.1008, over 3927999.14 frames. ], batch size: 88, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:45:01,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=399380.0, ans=0.1 2024-08-10 05:45:05,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399480.0, ans=0.1 2024-08-10 05:45:15,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=399480.0, ans=0.125 2024-08-10 05:45:23,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=399580.0, ans=0.125 2024-08-10 05:45:24,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=399580.0, ans=0.0 2024-08-10 05:45:26,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-08-10 05:45:33,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=399580.0, ans=0.125 2024-08-10 05:45:43,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=399680.0, ans=0.0 2024-08-10 05:45:45,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=399680.0, ans=0.0 2024-08-10 05:45:59,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-10 05:46:03,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=399780.0, ans=0.0 2024-08-10 05:46:05,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11000, loss[loss=0.1292, beats_loss=0.01282, ecapa_loss=0.0002907, whisper_loss=0.1134, over 22372.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01213, ecapa_loss=0.0002893, whisper_loss=0.1004, over 3931875.76 frames. ], batch size: 82, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:46:07,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=399880.0, ans=0.125 2024-08-10 05:46:26,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.898e+01 3.404e+01 3.976e+01 6.521e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 05:46:33,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-08-10 05:46:54,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2024-08-10 05:47:08,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=400280.0, ans=0.1 2024-08-10 05:47:13,508 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 05:47:21,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11050, loss[loss=0.1102, beats_loss=0.0132, ecapa_loss=0.0003065, whisper_loss=0.09396, over 17368.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01219, ecapa_loss=0.0002881, whisper_loss=0.1001, over 3931509.17 frames. ], batch size: 71, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:47:32,081 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 05:47:37,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=400480.0, ans=0.125 2024-08-10 05:47:40,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=400480.0, ans=0.125 2024-08-10 05:47:58,307 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 05:48:07,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.55 vs. limit=10.0 2024-08-10 05:48:14,243 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 12 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 05:48:34,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11100, loss[loss=0.1495, beats_loss=0.008178, ecapa_loss=0.0003461, whisper_loss=0.1379, over 19049.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01222, ecapa_loss=0.0002893, whisper_loss=0.09962, over 3920443.54 frames. ], batch size: 72, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:48:35,427 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 05:48:52,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2024-08-10 05:48:52,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.998e+01 3.322e+01 3.680e+01 7.626e+01, threshold=6.644e+01, percent-clipped=1.0 2024-08-10 05:48:58,194 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 05:49:11,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401080.0, ans=0.1 2024-08-10 05:49:20,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=401180.0, ans=0.125 2024-08-10 05:49:34,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=401280.0, ans=0.0 2024-08-10 05:49:37,177 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 05:49:47,859 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 05:49:50,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11150, loss[loss=0.1479, beats_loss=0.009603, ecapa_loss=0.0003197, whisper_loss=0.1351, over 16447.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.0121, ecapa_loss=0.0002896, whisper_loss=0.1001, over 3905030.45 frames. ], batch size: 62, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:50:00,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.09 vs. limit=22.5 2024-08-10 05:50:04,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2024-08-10 05:50:06,903 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-10 05:50:17,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=401480.0, ans=0.125 2024-08-10 05:50:31,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.59 vs. limit=22.5 2024-08-10 05:50:55,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=401780.0, ans=0.125 2024-08-10 05:51:01,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11200, loss[loss=0.08286, beats_loss=0.01638, ecapa_loss=0.0002572, whisper_loss=0.06391, over 14784.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01221, ecapa_loss=0.0002889, whisper_loss=0.09917, over 3898924.20 frames. ], batch size: 60, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:51:07,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=401880.0, ans=0.125 2024-08-10 05:51:15,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401980.0, ans=0.125 2024-08-10 05:51:18,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.098e+01 3.521e+01 4.109e+01 7.831e+01, threshold=7.041e+01, percent-clipped=1.0 2024-08-10 05:51:21,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401980.0, ans=0.125 2024-08-10 05:51:33,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2024-08-10 05:51:39,761 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 32 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 05:51:51,215 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 05:51:58,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=402180.0, ans=0.125 2024-08-10 05:52:03,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=402280.0, ans=0.2 2024-08-10 05:52:07,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=402280.0, ans=0.1 2024-08-10 05:52:09,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=402280.0, ans=0.1 2024-08-10 05:52:16,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11250, loss[loss=0.1163, beats_loss=0.01097, ecapa_loss=0.000348, whisper_loss=0.1018, over 15851.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01219, ecapa_loss=0.0002898, whisper_loss=0.09911, over 3896253.77 frames. ], batch size: 66, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:52:47,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=402580.0, ans=0.0 2024-08-10 05:52:58,584 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-10 05:53:12,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=402780.0, ans=0.0 2024-08-10 05:53:20,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=402780.0, ans=0.0 2024-08-10 05:53:27,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11300, loss[loss=0.1248, beats_loss=0.007319, ecapa_loss=0.0003059, whisper_loss=0.1144, over 16810.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01216, ecapa_loss=0.0002898, whisper_loss=0.09901, over 3879713.76 frames. ], batch size: 65, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:53:38,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=402880.0, ans=0.1 2024-08-10 05:53:44,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.975e+01 3.482e+01 3.976e+01 1.269e+02, threshold=6.963e+01, percent-clipped=1.0 2024-08-10 05:53:53,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=402980.0, ans=0.0 2024-08-10 05:54:07,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403080.0, ans=0.1 2024-08-10 05:54:14,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403180.0, ans=0.1 2024-08-10 05:54:18,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=403180.0, ans=0.0 2024-08-10 05:54:26,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.16 vs. limit=15.0 2024-08-10 05:54:33,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=403280.0, ans=0.125 2024-08-10 05:54:36,035 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 05:54:39,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11350, loss[loss=0.0915, beats_loss=0.01785, ecapa_loss=0.0002425, whisper_loss=0.07122, over 22112.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01217, ecapa_loss=0.0002893, whisper_loss=0.09927, over 3877329.51 frames. ], batch size: 93, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:54:46,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2024-08-10 05:55:27,286 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 05:55:30,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=403680.0, ans=0.125 2024-08-10 05:55:55,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11400, loss[loss=0.1154, beats_loss=0.01322, ecapa_loss=0.0002392, whisper_loss=0.09983, over 14398.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01206, ecapa_loss=0.0002914, whisper_loss=0.0999, over 3890603.31 frames. ], batch size: 55, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:55:57,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=403880.0, ans=0.1 2024-08-10 05:56:01,441 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 05:56:12,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=403980.0, ans=0.0 2024-08-10 05:56:13,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.448e+01 3.091e+01 3.465e+01 3.981e+01 8.996e+01, threshold=6.931e+01, percent-clipped=1.0 2024-08-10 05:56:18,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=403980.0, ans=0.125 2024-08-10 05:56:28,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=404080.0, ans=0.125 2024-08-10 05:56:29,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=404080.0, ans=0.125 2024-08-10 05:56:48,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404180.0, ans=0.1 2024-08-10 05:57:03,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=404280.0, ans=0.125 2024-08-10 05:57:08,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11450, loss[loss=0.1416, beats_loss=0.01006, ecapa_loss=0.000297, whisper_loss=0.1285, over 18399.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01211, ecapa_loss=0.0002905, whisper_loss=0.1004, over 3899550.27 frames. ], batch size: 71, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:57:18,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=404380.0, ans=0.015 2024-08-10 05:57:27,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=404480.0, ans=0.125 2024-08-10 05:58:12,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=404780.0, ans=0.2 2024-08-10 05:58:16,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=404780.0, ans=0.0 2024-08-10 05:58:23,156 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 05:58:24,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11500, loss[loss=0.1004, beats_loss=0.01269, ecapa_loss=0.0002371, whisper_loss=0.08531, over 15346.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01215, ecapa_loss=0.0002911, whisper_loss=0.0998, over 3895413.14 frames. ], batch size: 60, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:58:40,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=404980.0, ans=0.125 2024-08-10 05:58:42,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 3.195e+01 3.620e+01 4.078e+01 2.789e+02, threshold=7.241e+01, percent-clipped=1.0 2024-08-10 05:59:25,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=405280.0, ans=0.125 2024-08-10 05:59:29,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=405280.0, ans=0.125 2024-08-10 05:59:38,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11550, loss[loss=0.1262, beats_loss=0.00943, ecapa_loss=0.0003058, whisper_loss=0.1137, over 15075.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.0121, ecapa_loss=0.0002909, whisper_loss=0.09946, over 3896330.47 frames. ], batch size: 58, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:59:40,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=405380.0, ans=0.0 2024-08-10 05:59:40,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=405380.0, ans=0.0 2024-08-10 05:59:58,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=405480.0, ans=0.0 2024-08-10 06:00:01,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=405480.0, ans=0.125 2024-08-10 06:00:24,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=405680.0, ans=0.05 2024-08-10 06:00:34,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2024-08-10 06:00:37,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=405680.0, ans=0.04949747468305833 2024-08-10 06:00:54,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11600, loss[loss=0.1101, beats_loss=0.01235, ecapa_loss=0.0003992, whisper_loss=0.09378, over 14198.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01216, ecapa_loss=0.000292, whisper_loss=0.0989, over 3869226.97 frames. ], batch size: 59, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:01:00,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=405880.0, ans=0.2 2024-08-10 06:01:11,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.361e+01 3.673e+01 4.425e+01 6.331e+01, threshold=7.346e+01, percent-clipped=0.0 2024-08-10 06:01:14,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=28.03 vs. limit=22.5 2024-08-10 06:01:15,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=405980.0, ans=0.0 2024-08-10 06:01:51,000 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 06:02:03,827 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 32 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 06:02:05,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=406380.0, ans=0.2 2024-08-10 06:02:06,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11650, loss[loss=0.1073, beats_loss=0.01138, ecapa_loss=0.0002658, whisper_loss=0.09326, over 15882.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01211, ecapa_loss=0.0002926, whisper_loss=0.09913, over 3888341.01 frames. ], batch size: 61, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:02:21,061 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 06:02:29,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=406480.0, ans=0.125 2024-08-10 06:02:43,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=406580.0, ans=0.125 2024-08-10 06:02:48,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-08-10 06:02:50,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=406680.0, ans=0.1 2024-08-10 06:03:00,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=406680.0, ans=0.0 2024-08-10 06:03:02,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=406780.0, ans=0.0 2024-08-10 06:03:14,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=406780.0, ans=10.0 2024-08-10 06:03:16,723 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11700, loss[loss=0.1212, beats_loss=0.01023, ecapa_loss=0.0003105, whisper_loss=0.1079, over 16486.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01221, ecapa_loss=0.0002894, whisper_loss=0.09919, over 3876087.00 frames. ], batch size: 62, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:03:24,484 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 06:03:29,833 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 06:03:33,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.451e+01 3.237e+01 3.576e+01 4.266e+01 6.520e+01, threshold=7.151e+01, percent-clipped=0.0 2024-08-10 06:03:35,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=406980.0, ans=0.2 2024-08-10 06:03:41,279 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 18 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 06:03:46,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-10 06:03:47,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=407080.0, ans=0.0 2024-08-10 06:03:59,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=407180.0, ans=0.125 2024-08-10 06:04:02,166 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 06:04:06,442 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 06:04:09,323 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 06:04:12,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=35.40 vs. limit=15.0 2024-08-10 06:04:29,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11750, loss[loss=0.1446, beats_loss=0.01107, ecapa_loss=0.0002383, whisper_loss=0.1312, over 15517.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01225, ecapa_loss=0.0002877, whisper_loss=0.09916, over 3895604.56 frames. ], batch size: 57, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:04:36,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=407380.0, ans=0.07 2024-08-10 06:04:39,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=407380.0, ans=0.0 2024-08-10 06:04:46,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=407480.0, ans=0.0 2024-08-10 06:04:52,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-10 06:04:53,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=407480.0, ans=0.125 2024-08-10 06:05:20,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=407680.0, ans=0.2 2024-08-10 06:05:26,853 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-10 06:05:34,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407780.0, ans=0.1 2024-08-10 06:05:36,135 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 06:05:40,272 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 06:05:43,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11800, loss[loss=0.09686, beats_loss=0.01551, ecapa_loss=0.0002349, whisper_loss=0.079, over 22073.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01231, ecapa_loss=0.0002863, whisper_loss=0.09947, over 3913289.39 frames. ], batch size: 89, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:05:58,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407980.0, ans=0.1 2024-08-10 06:05:59,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 3.074e+01 3.455e+01 3.897e+01 7.543e+01, threshold=6.910e+01, percent-clipped=1.0 2024-08-10 06:06:03,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=407980.0, ans=0.5 2024-08-10 06:06:08,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=407980.0, ans=0.0 2024-08-10 06:06:21,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-10 06:06:24,853 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 06:06:41,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=408280.0, ans=0.125 2024-08-10 06:06:49,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=408280.0, ans=0.125 2024-08-10 06:06:53,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11850, loss[loss=0.1069, beats_loss=0.01297, ecapa_loss=0.0003038, whisper_loss=0.0909, over 21077.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.0123, ecapa_loss=0.0002875, whisper_loss=0.09912, over 3915386.92 frames. ], batch size: 85, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:06:54,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=408380.0, ans=0.2 2024-08-10 06:07:05,073 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 06:07:15,155 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 06:07:23,830 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 06:07:26,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.89 vs. limit=15.0 2024-08-10 06:07:26,708 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 06:07:44,415 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 06:07:54,857 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 06:08:01,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=15.0 2024-08-10 06:08:03,986 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11900, loss[loss=0.09865, beats_loss=0.01399, ecapa_loss=0.0002566, whisper_loss=0.08209, over 19888.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01241, ecapa_loss=0.0002873, whisper_loss=0.09864, over 3938034.61 frames. ], batch size: 79, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:08:14,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=408880.0, ans=0.125 2024-08-10 06:08:16,562 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 06:08:16,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=408980.0, ans=0.125 2024-08-10 06:08:20,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 3.266e+01 3.553e+01 4.247e+01 1.215e+02, threshold=7.106e+01, percent-clipped=1.0 2024-08-10 06:08:20,827 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 06:08:32,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2024-08-10 06:08:33,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=409080.0, ans=0.1 2024-08-10 06:08:33,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=12.0 2024-08-10 06:08:34,464 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 06:08:46,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=409180.0, ans=0.125 2024-08-10 06:08:49,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2024-08-10 06:08:52,864 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 17 from LS+wenet, 28 from Vox, 52 fro AS 2024-08-10 06:08:58,318 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 06:09:05,208 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 06:09:13,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 11950, loss[loss=0.09698, beats_loss=0.0136, ecapa_loss=0.0003308, whisper_loss=0.08007, over 21958.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01233, ecapa_loss=0.0002891, whisper_loss=0.09842, over 3892701.82 frames. ], batch size: 93, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:09:16,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=409380.0, ans=0.1 2024-08-10 06:09:21,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=409380.0, ans=0.0 2024-08-10 06:09:36,979 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 06:09:56,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=12.0 2024-08-10 06:10:01,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=409680.0, ans=0.125 2024-08-10 06:10:22,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12000, loss[loss=0.1102, beats_loss=0.01137, ecapa_loss=0.0003568, whisper_loss=0.09529, over 22159.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01239, ecapa_loss=0.000289, whisper_loss=0.09771, over 3874506.07 frames. ], batch size: 93, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:10:22,730 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 06:11:01,969 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on ASR_libri: loss=0.2695, beats_loss=0, ecapa_loss=0.000863, whisper_loss=0.2608, over 922467.00 frames. 2024-08-10 06:11:17,694 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on SV_voxceleb1: loss=0.007635, beats_loss=0, ecapa_loss=0.0007635, whisper_loss=0, over 939242.00 frames. 2024-08-10 06:13:11,107 INFO [train_multi_KD3.py:1149] (3/4) Epoch 3, validation on AT_audioset: loss=0.0284, beats_loss=0.0284, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 06:13:11,118 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 06:13:28,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.155e+01 3.494e+01 4.116e+01 7.765e+01, threshold=6.989e+01, percent-clipped=1.0 2024-08-10 06:13:29,875 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 06:13:36,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=409980.0, ans=0.125 2024-08-10 06:13:47,099 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-10 06:14:03,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2024-08-10 06:14:12,765 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 06:14:15,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=410280.0, ans=0.2 2024-08-10 06:14:18,249 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-10 06:14:18,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2024-08-10 06:14:22,078 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 06:14:23,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12050, loss[loss=0.1166, beats_loss=0.01119, ecapa_loss=0.0002811, whisper_loss=0.1026, over 22513.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01242, ecapa_loss=0.0002879, whisper_loss=0.09747, over 3852124.33 frames. ], batch size: 90, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:15:05,408 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 06:15:08,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410680.0, ans=0.1 2024-08-10 06:15:09,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=410680.0, ans=0.125 2024-08-10 06:15:33,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12100, loss[loss=0.1292, beats_loss=0.009139, ecapa_loss=0.0003431, whisper_loss=0.1166, over 13461.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01235, ecapa_loss=0.0002881, whisper_loss=0.09773, over 3856678.12 frames. ], batch size: 53, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:15:35,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2024-08-10 06:15:37,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=410880.0, ans=0.0 2024-08-10 06:15:49,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 3.160e+01 3.535e+01 4.240e+01 9.123e+01, threshold=7.071e+01, percent-clipped=3.0 2024-08-10 06:15:58,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=410980.0, ans=0.2 2024-08-10 06:16:12,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=411080.0, ans=0.0 2024-08-10 06:16:21,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411180.0, ans=0.1 2024-08-10 06:16:29,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=411280.0, ans=0.0 2024-08-10 06:16:32,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=411280.0, ans=0.0 2024-08-10 06:16:33,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=411280.0, ans=0.5 2024-08-10 06:16:41,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12150, loss[loss=0.1318, beats_loss=0.007967, ecapa_loss=0.0003667, whisper_loss=0.1201, over 14656.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01233, ecapa_loss=0.0002887, whisper_loss=0.09749, over 3840199.16 frames. ], batch size: 59, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:16:43,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2024-08-10 06:16:46,356 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 06:17:00,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=411480.0, ans=0.0 2024-08-10 06:17:09,536 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 06:17:09,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=411580.0, ans=0.2 2024-08-10 06:17:17,724 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 06:17:19,344 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 06:17:33,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.30 vs. limit=22.5 2024-08-10 06:17:50,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12200, loss[loss=0.1154, beats_loss=0.01023, ecapa_loss=0.0002665, whisper_loss=0.1025, over 14856.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01228, ecapa_loss=0.0002889, whisper_loss=0.09789, over 3806657.30 frames. ], batch size: 55, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:17:51,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=411880.0, ans=0.0 2024-08-10 06:17:52,647 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 06:17:59,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411880.0, ans=0.1 2024-08-10 06:18:07,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 3.192e+01 3.663e+01 4.187e+01 6.724e+01, threshold=7.326e+01, percent-clipped=0.0 2024-08-10 06:18:24,631 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 06:18:32,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=412080.0, ans=0.125 2024-08-10 06:18:42,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=412180.0, ans=0.2 2024-08-10 06:18:49,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-10 06:18:52,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-08-10 06:18:55,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=412280.0, ans=0.0 2024-08-10 06:18:56,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412280.0, ans=0.1 2024-08-10 06:19:03,405 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12250, loss[loss=0.09558, beats_loss=0.0152, ecapa_loss=0.0003001, whisper_loss=0.07737, over 18572.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01231, ecapa_loss=0.0002894, whisper_loss=0.09829, over 3847304.72 frames. ], batch size: 78, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:19:03,540 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 06:19:12,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2024-08-10 06:19:23,140 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 23 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-10 06:19:28,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-10 06:19:34,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=412580.0, ans=0.125 2024-08-10 06:20:12,513 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12300, loss[loss=0.1325, beats_loss=0.01176, ecapa_loss=0.0002983, whisper_loss=0.1178, over 23372.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01235, ecapa_loss=0.0002906, whisper_loss=0.09805, over 3874994.71 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:20:18,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=412880.0, ans=0.0 2024-08-10 06:20:19,671 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 06:20:28,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.356e+01 3.807e+01 4.575e+01 1.219e+02, threshold=7.614e+01, percent-clipped=2.0 2024-08-10 06:20:44,557 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 06:20:49,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=12.0 2024-08-10 06:20:51,627 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.142e-02 2024-08-10 06:20:52,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=413180.0, ans=0.125 2024-08-10 06:20:54,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=413180.0, ans=0.125 2024-08-10 06:21:21,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12350, loss[loss=0.1121, beats_loss=0.01212, ecapa_loss=0.0003285, whisper_loss=0.09667, over 15035.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01231, ecapa_loss=0.0002943, whisper_loss=0.09834, over 3864664.83 frames. ], batch size: 58, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:21:23,433 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 25 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-10 06:21:37,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=413480.0, ans=0.1 2024-08-10 06:21:37,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2024-08-10 06:21:42,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=413480.0, ans=0.125 2024-08-10 06:21:51,782 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 06:21:53,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2024-08-10 06:22:05,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=413680.0, ans=0.0 2024-08-10 06:22:08,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=12.0 2024-08-10 06:22:12,095 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 06:22:30,414 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12400, loss[loss=0.07782, beats_loss=0.01265, ecapa_loss=0.0003082, whisper_loss=0.06209, over 15111.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01228, ecapa_loss=0.0002946, whisper_loss=0.09783, over 3866470.24 frames. ], batch size: 66, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:22:41,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=413880.0, ans=0.125 2024-08-10 06:22:47,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.123e+01 3.503e+01 4.019e+01 1.294e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 06:22:55,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=413980.0, ans=0.125 2024-08-10 06:23:09,087 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 06:23:15,845 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 06:23:19,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=414180.0, ans=0.125 2024-08-10 06:23:36,669 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 06:23:39,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12450, loss[loss=0.1279, beats_loss=0.01196, ecapa_loss=0.000294, whisper_loss=0.113, over 14454.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01217, ecapa_loss=0.0002944, whisper_loss=0.09804, over 3881736.38 frames. ], batch size: 57, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:24:07,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2024-08-10 06:24:13,359 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 06:24:34,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=414780.0, ans=0.125 2024-08-10 06:24:39,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=414780.0, ans=0.0 2024-08-10 06:24:49,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12500, loss[loss=0.108, beats_loss=0.01223, ecapa_loss=0.0002883, whisper_loss=0.09289, over 22791.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01219, ecapa_loss=0.0002943, whisper_loss=0.09806, over 3891766.17 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:24:52,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414880.0, ans=0.1 2024-08-10 06:25:03,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=414980.0, ans=0.125 2024-08-10 06:25:06,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 3.279e+01 3.697e+01 4.212e+01 5.815e+01, threshold=7.393e+01, percent-clipped=0.0 2024-08-10 06:25:10,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=414980.0, ans=0.05 2024-08-10 06:25:19,646 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-10 06:25:21,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=415080.0, ans=0.0 2024-08-10 06:25:29,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=415180.0, ans=0.125 2024-08-10 06:25:49,097 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 06:25:49,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2024-08-10 06:25:59,051 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.810e-01 2024-08-10 06:26:01,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12550, loss[loss=0.1095, beats_loss=0.01284, ecapa_loss=0.0002738, whisper_loss=0.09395, over 19568.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01222, ecapa_loss=0.0002909, whisper_loss=0.09854, over 3912042.10 frames. ], batch size: 77, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:26:20,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=415480.0, ans=0.125 2024-08-10 06:26:33,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.45 vs. limit=22.5 2024-08-10 06:26:37,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=415580.0, ans=0.0 2024-08-10 06:26:41,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415580.0, ans=0.1 2024-08-10 06:26:51,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=415680.0, ans=0.125 2024-08-10 06:26:53,771 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 30 from Vox, 19 fro AS 2024-08-10 06:26:55,303 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 06:26:59,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2024-08-10 06:27:04,600 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 06:27:14,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12600, loss[loss=0.1223, beats_loss=0.01295, ecapa_loss=0.0002098, whisper_loss=0.1073, over 15732.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01228, ecapa_loss=0.0002882, whisper_loss=0.09829, over 3917001.75 frames. ], batch size: 60, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:27:32,064 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 06:27:32,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=415980.0, ans=0.0 2024-08-10 06:27:35,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 3.161e+01 3.514e+01 4.071e+01 6.890e+01, threshold=7.028e+01, percent-clipped=0.0 2024-08-10 06:27:42,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=415980.0, ans=0.125 2024-08-10 06:27:46,070 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 06:27:46,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415980.0, ans=0.1 2024-08-10 06:28:14,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=416180.0, ans=15.0 2024-08-10 06:28:16,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=416180.0, ans=0.125 2024-08-10 06:28:30,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.29 vs. limit=10.0 2024-08-10 06:28:35,633 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 19 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 06:28:38,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12650, loss[loss=0.1057, beats_loss=0.01314, ecapa_loss=0.0003272, whisper_loss=0.08934, over 19050.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01229, ecapa_loss=0.0002872, whisper_loss=0.09789, over 3908735.00 frames. ], batch size: 78, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:28:42,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=416380.0, ans=0.025 2024-08-10 06:28:58,533 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 06:29:03,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=416480.0, ans=0.125 2024-08-10 06:29:05,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=416480.0, ans=0.0 2024-08-10 06:29:06,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=416480.0, ans=0.125 2024-08-10 06:29:14,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2024-08-10 06:29:42,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2024-08-10 06:29:47,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-10 06:30:00,154 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12700, loss[loss=0.1141, beats_loss=0.01007, ecapa_loss=0.0002977, whisper_loss=0.101, over 19658.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01224, ecapa_loss=0.0002875, whisper_loss=0.09834, over 3901560.89 frames. ], batch size: 75, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:30:01,564 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.080e+05 2024-08-10 06:30:04,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416880.0, ans=0.1 2024-08-10 06:30:09,991 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 23 from LS+wenet, 18 from Vox, 55 fro AS 2024-08-10 06:30:15,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-10 06:30:23,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.052e+01 3.376e+01 3.987e+01 6.626e+01, threshold=6.752e+01, percent-clipped=0.0 2024-08-10 06:30:41,960 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 06:30:54,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=417080.0, ans=0.0 2024-08-10 06:31:14,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=417180.0, ans=0.125 2024-08-10 06:31:18,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=417180.0, ans=0.0 2024-08-10 06:31:39,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=417380.0, ans=0.0 2024-08-10 06:31:40,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12750, loss[loss=0.09935, beats_loss=0.01085, ecapa_loss=0.0003431, whisper_loss=0.08506, over 16405.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.0123, ecapa_loss=0.0002897, whisper_loss=0.09832, over 3896655.75 frames. ], batch size: 66, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:32:29,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=417580.0, ans=0.2 2024-08-10 06:32:29,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-10 06:32:45,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=15.0 2024-08-10 06:32:55,047 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 06:33:18,101 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 06:33:19,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=417880.0, ans=0.2 2024-08-10 06:33:20,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12800, loss[loss=0.1271, beats_loss=0.01217, ecapa_loss=0.0002326, whisper_loss=0.1126, over 23997.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01237, ecapa_loss=0.000291, whisper_loss=0.09815, over 3896490.80 frames. ], batch size: 90, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:33:34,363 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 22 from LS+wenet, 20 from Vox, 53 fro AS 2024-08-10 06:33:42,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.114e+01 3.592e+01 4.168e+01 8.043e+01, threshold=7.184e+01, percent-clipped=1.0 2024-08-10 06:34:13,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=418080.0, ans=0.09899494936611666 2024-08-10 06:34:23,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418180.0, ans=0.1 2024-08-10 06:34:23,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=418180.0, ans=0.125 2024-08-10 06:34:31,223 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 06:34:56,120 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.894e+01 2024-08-10 06:34:57,086 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 06:34:59,443 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12850, loss[loss=0.1203, beats_loss=0.01256, ecapa_loss=0.0002627, whisper_loss=0.1052, over 23359.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.0124, ecapa_loss=0.0002885, whisper_loss=0.09795, over 3901204.66 frames. ], batch size: 95, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:35:00,078 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 06:35:03,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-10 06:35:07,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=418380.0, ans=0.125 2024-08-10 06:35:10,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-10 06:35:28,057 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-10 06:35:29,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=418580.0, ans=0.125 2024-08-10 06:35:33,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=418580.0, ans=15.0 2024-08-10 06:35:45,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418680.0, ans=0.1 2024-08-10 06:35:49,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=418680.0, ans=0.04949747468305833 2024-08-10 06:36:09,842 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12900, loss[loss=0.1139, beats_loss=0.01236, ecapa_loss=0.0002561, whisper_loss=0.09899, over 18096.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.0124, ecapa_loss=0.0002869, whisper_loss=0.09708, over 3861950.53 frames. ], batch size: 72, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:36:12,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2024-08-10 06:36:26,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.152e+01 3.621e+01 4.177e+01 6.125e+01, threshold=7.242e+01, percent-clipped=0.0 2024-08-10 06:36:34,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418980.0, ans=0.1 2024-08-10 06:36:34,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-10 06:36:46,479 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 06:36:55,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=419180.0, ans=0.025 2024-08-10 06:36:57,633 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 06:37:06,188 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 06:37:10,508 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 9 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 06:37:19,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 12950, loss[loss=0.1126, beats_loss=0.01292, ecapa_loss=0.0003269, whisper_loss=0.0964, over 21461.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01227, ecapa_loss=0.0002887, whisper_loss=0.09737, over 3868315.61 frames. ], batch size: 92, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:37:20,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=419380.0, ans=0.125 2024-08-10 06:37:41,867 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-10 06:37:44,593 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 06:37:59,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=419680.0, ans=0.0 2024-08-10 06:38:28,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13000, loss[loss=0.1097, beats_loss=0.01462, ecapa_loss=0.0002965, whisper_loss=0.09214, over 21618.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01222, ecapa_loss=0.0002918, whisper_loss=0.09787, over 3872075.94 frames. ], batch size: 90, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:38:29,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-08-10 06:38:32,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=419880.0, ans=0.125 2024-08-10 06:38:45,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 3.317e+01 3.869e+01 4.527e+01 7.040e+01, threshold=7.738e+01, percent-clipped=0.0 2024-08-10 06:38:54,775 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-10 06:39:03,826 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 06:39:05,354 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 06:39:07,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420080.0, ans=0.1 2024-08-10 06:39:29,190 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-10 06:39:35,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-10 06:39:36,448 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 06:39:42,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13050, loss[loss=0.1062, beats_loss=0.01485, ecapa_loss=0.0002303, whisper_loss=0.08902, over 22997.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01228, ecapa_loss=0.0002925, whisper_loss=0.09762, over 3882992.68 frames. ], batch size: 90, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:39:55,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=420380.0, ans=0.125 2024-08-10 06:40:15,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-10 06:40:46,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=420780.0, ans=0.0 2024-08-10 06:40:56,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13100, loss[loss=0.1088, beats_loss=0.01253, ecapa_loss=0.0002868, whisper_loss=0.09339, over 22815.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01228, ecapa_loss=0.0002907, whisper_loss=0.09784, over 3889205.82 frames. ], batch size: 94, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:41:14,673 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.548e+01 3.107e+01 3.501e+01 3.954e+01 7.732e+01, threshold=7.002e+01, percent-clipped=0.0 2024-08-10 06:41:28,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=421080.0, ans=0.125 2024-08-10 06:41:35,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=421080.0, ans=0.125 2024-08-10 06:41:38,237 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 28 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 06:41:46,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421180.0, ans=0.1 2024-08-10 06:41:56,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=421280.0, ans=0.125 2024-08-10 06:42:02,046 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 06:42:11,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=421380.0, ans=0.2 2024-08-10 06:42:12,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13150, loss[loss=0.09943, beats_loss=0.009486, ecapa_loss=0.0003693, whisper_loss=0.08625, over 15856.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01231, ecapa_loss=0.0002898, whisper_loss=0.09781, over 3881745.30 frames. ], batch size: 65, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:42:26,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=421480.0, ans=0.125 2024-08-10 06:42:29,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2024-08-10 06:42:37,172 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-10 06:42:53,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=421580.0, ans=0.0 2024-08-10 06:43:03,459 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-10 06:43:22,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.68 vs. limit=22.5 2024-08-10 06:43:25,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13200, loss[loss=0.09672, beats_loss=0.01543, ecapa_loss=0.0002654, whisper_loss=0.07864, over 22347.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01238, ecapa_loss=0.0002911, whisper_loss=0.09722, over 3883800.17 frames. ], batch size: 94, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:43:27,330 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 06:43:42,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+01 3.092e+01 3.479e+01 4.168e+01 6.203e+01, threshold=6.958e+01, percent-clipped=0.0 2024-08-10 06:44:26,058 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 06:44:41,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13250, loss[loss=0.1496, beats_loss=0.01128, ecapa_loss=0.000238, whisper_loss=0.1359, over 18306.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01225, ecapa_loss=0.0002924, whisper_loss=0.09755, over 3862630.11 frames. ], batch size: 67, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:44:42,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=422380.0, ans=0.2 2024-08-10 06:44:50,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=422380.0, ans=0.0 2024-08-10 06:44:55,190 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 06:45:38,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=422680.0, ans=0.125 2024-08-10 06:45:42,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-10 06:45:52,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=422780.0, ans=0.125 2024-08-10 06:45:56,236 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 06:45:57,467 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13300, loss[loss=0.1323, beats_loss=0.01019, ecapa_loss=0.000298, whisper_loss=0.1191, over 20781.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01221, ecapa_loss=0.0002923, whisper_loss=0.09813, over 3841475.55 frames. ], batch size: 82, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:46:05,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=422880.0, ans=0.05 2024-08-10 06:46:08,257 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 06:46:08,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=422880.0, ans=0.2 2024-08-10 06:46:14,141 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-10 06:46:14,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=422980.0, ans=0.1 2024-08-10 06:46:15,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 3.388e+01 3.671e+01 4.200e+01 6.497e+01, threshold=7.342e+01, percent-clipped=0.0 2024-08-10 06:46:27,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=423080.0, ans=0.125 2024-08-10 06:46:37,496 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 06:46:46,602 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 06:46:48,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2024-08-10 06:46:52,624 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 06:47:03,828 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 06:47:10,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13350, loss[loss=0.1153, beats_loss=0.01003, ecapa_loss=0.0003102, whisper_loss=0.1022, over 21651.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.0123, ecapa_loss=0.0002897, whisper_loss=0.0974, over 3847575.13 frames. ], batch size: 86, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:47:12,201 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 06:47:35,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=423480.0, ans=0.125 2024-08-10 06:47:39,242 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 06:47:54,272 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 06:47:58,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=423680.0, ans=0.125 2024-08-10 06:48:01,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=423680.0, ans=0.125 2024-08-10 06:48:24,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13400, loss[loss=0.09456, beats_loss=0.01215, ecapa_loss=0.0002959, whisper_loss=0.07946, over 15791.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.0123, ecapa_loss=0.0002897, whisper_loss=0.09769, over 3857493.53 frames. ], batch size: 63, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:48:25,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.39 vs. limit=8.0 2024-08-10 06:48:27,582 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 06:48:29,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=423880.0, ans=0.125 2024-08-10 06:48:39,128 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-10 06:48:42,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 3.269e+01 3.722e+01 4.193e+01 5.690e+01, threshold=7.444e+01, percent-clipped=0.0 2024-08-10 06:48:48,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423980.0, ans=0.1 2024-08-10 06:49:00,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=424080.0, ans=0.5 2024-08-10 06:49:02,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=424080.0, ans=0.09899494936611666 2024-08-10 06:49:09,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424180.0, ans=0.125 2024-08-10 06:49:21,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-10 06:49:38,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13450, loss[loss=0.1514, beats_loss=0.006882, ecapa_loss=0.000315, whisper_loss=0.1414, over 19639.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01223, ecapa_loss=0.00029, whisper_loss=0.09842, over 3849385.49 frames. ], batch size: 73, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:49:46,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=15.0 2024-08-10 06:49:52,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=424480.0, ans=0.125 2024-08-10 06:49:56,366 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 06:50:07,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2024-08-10 06:50:19,321 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 06:50:25,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2024-08-10 06:50:32,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=424680.0, ans=0.0 2024-08-10 06:50:35,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=424780.0, ans=0.125 2024-08-10 06:50:48,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=424780.0, ans=0.125 2024-08-10 06:50:50,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13500, loss[loss=0.1213, beats_loss=0.01404, ecapa_loss=0.0002758, whisper_loss=0.1045, over 22495.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01231, ecapa_loss=0.000291, whisper_loss=0.09756, over 3821235.62 frames. ], batch size: 93, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:50:51,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-10 06:51:00,611 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 06:51:00,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424880.0, ans=0.1 2024-08-10 06:51:07,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.618e+01 3.316e+01 3.785e+01 4.530e+01 1.081e+02, threshold=7.570e+01, percent-clipped=1.0 2024-08-10 06:51:33,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=425180.0, ans=0.0 2024-08-10 06:51:43,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-10 06:51:43,749 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 06:51:57,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=425280.0, ans=0.0 2024-08-10 06:52:01,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13550, loss[loss=0.1097, beats_loss=0.01217, ecapa_loss=0.000243, whisper_loss=0.09505, over 16562.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.0123, ecapa_loss=0.0002912, whisper_loss=0.0977, over 3840504.12 frames. ], batch size: 63, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:52:13,174 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-10 06:52:26,329 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 06:52:29,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-10 06:52:29,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-08-10 06:53:03,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=425780.0, ans=0.125 2024-08-10 06:53:13,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13600, loss[loss=0.1207, beats_loss=0.01238, ecapa_loss=0.0002062, whisper_loss=0.1063, over 18445.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01222, ecapa_loss=0.0002907, whisper_loss=0.09847, over 3878887.25 frames. ], batch size: 67, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:53:25,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-10 06:53:29,368 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 06:53:30,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 3.163e+01 3.442e+01 4.144e+01 6.667e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 06:53:46,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=426080.0, ans=0.125 2024-08-10 06:53:48,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=426080.0, ans=0.125 2024-08-10 06:53:58,045 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 06:54:05,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=426180.0, ans=0.125 2024-08-10 06:54:19,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=426280.0, ans=0.0 2024-08-10 06:54:22,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=426280.0, ans=0.125 2024-08-10 06:54:24,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13650, loss[loss=0.1102, beats_loss=0.01487, ecapa_loss=0.0002308, whisper_loss=0.09304, over 13865.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01227, ecapa_loss=0.0002901, whisper_loss=0.0984, over 3893658.01 frames. ], batch size: 55, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:54:27,360 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 06:54:31,424 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 06:54:35,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426380.0, ans=0.1 2024-08-10 06:54:39,857 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 06:54:40,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.26 vs. limit=10.0 2024-08-10 06:54:50,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=426480.0, ans=0.2 2024-08-10 06:54:56,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.69 vs. limit=22.5 2024-08-10 06:55:18,174 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 06:55:18,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2024-08-10 06:55:22,334 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 06:55:25,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=426780.0, ans=0.125 2024-08-10 06:55:26,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=426780.0, ans=0.2 2024-08-10 06:55:27,534 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-10 06:55:28,840 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 06:55:32,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=426880.0, ans=0.0 2024-08-10 06:55:33,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13700, loss[loss=0.09887, beats_loss=0.01197, ecapa_loss=0.0003367, whisper_loss=0.08354, over 20671.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01238, ecapa_loss=0.0002935, whisper_loss=0.09803, over 3909663.82 frames. ], batch size: 82, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:55:33,429 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 06:55:38,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=426880.0, ans=0.125 2024-08-10 06:55:49,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+01 3.221e+01 3.630e+01 4.052e+01 7.780e+01, threshold=7.261e+01, percent-clipped=2.0 2024-08-10 06:56:03,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=427080.0, ans=0.0 2024-08-10 06:56:04,749 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 06:56:29,167 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 06:56:43,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13750, loss[loss=0.1046, beats_loss=0.01295, ecapa_loss=0.0002337, whisper_loss=0.08931, over 20459.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01223, ecapa_loss=0.0002968, whisper_loss=0.09839, over 3886854.58 frames. ], batch size: 76, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:56:48,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.75 vs. limit=10.0 2024-08-10 06:57:38,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427780.0, ans=0.125 2024-08-10 06:57:53,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13800, loss[loss=0.1206, beats_loss=0.0118, ecapa_loss=0.0002996, whisper_loss=0.1058, over 21859.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01224, ecapa_loss=0.0002938, whisper_loss=0.09879, over 3910974.92 frames. ], batch size: 89, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:57:54,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-10 06:57:59,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=427880.0, ans=15.0 2024-08-10 06:58:06,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=427980.0, ans=0.2 2024-08-10 06:58:10,198 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+01 3.325e+01 3.732e+01 4.469e+01 6.721e+01, threshold=7.464e+01, percent-clipped=0.0 2024-08-10 06:58:13,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2024-08-10 06:58:24,446 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 06:58:26,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=428080.0, ans=0.0 2024-08-10 06:58:27,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428080.0, ans=0.1 2024-08-10 06:58:30,080 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-10 06:58:35,598 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 06:58:36,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=15.0 2024-08-10 06:58:50,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=428280.0, ans=0.09899494936611666 2024-08-10 06:59:02,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13850, loss[loss=0.1063, beats_loss=0.01226, ecapa_loss=0.0003418, whisper_loss=0.09066, over 19023.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01228, ecapa_loss=0.0002915, whisper_loss=0.09837, over 3915198.29 frames. ], batch size: 78, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:59:13,579 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 06:59:20,419 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 06:59:21,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=428480.0, ans=0.0 2024-08-10 06:59:22,913 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 06:59:26,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=428480.0, ans=0.125 2024-08-10 06:59:37,731 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 06:59:51,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=428680.0, ans=0.0 2024-08-10 07:00:09,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2024-08-10 07:00:09,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13900, loss[loss=0.1054, beats_loss=0.01358, ecapa_loss=0.0003096, whisper_loss=0.08874, over 19869.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01215, ecapa_loss=0.0002964, whisper_loss=0.0987, over 3898657.25 frames. ], batch size: 82, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:00:11,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=428880.0, ans=0.0 2024-08-10 07:00:17,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=15.0 2024-08-10 07:00:17,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=12.0 2024-08-10 07:00:17,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2024-08-10 07:00:18,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=428880.0, ans=10.0 2024-08-10 07:00:26,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.310e+01 3.794e+01 4.612e+01 1.013e+02, threshold=7.587e+01, percent-clipped=2.0 2024-08-10 07:00:27,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=428980.0, ans=0.125 2024-08-10 07:00:30,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-10 07:00:32,350 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 07:00:36,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=429080.0, ans=0.125 2024-08-10 07:00:56,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=429180.0, ans=0.125 2024-08-10 07:01:12,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2024-08-10 07:01:14,217 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 07:01:18,169 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 13950, loss[loss=0.108, beats_loss=0.01243, ecapa_loss=0.0003275, whisper_loss=0.09234, over 18626.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01222, ecapa_loss=0.0002959, whisper_loss=0.09839, over 3896334.50 frames. ], batch size: 80, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:01:19,870 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 07:01:21,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=429380.0, ans=0.125 2024-08-10 07:01:27,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=429380.0, ans=0.125 2024-08-10 07:01:31,134 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 07:01:48,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=22.5 2024-08-10 07:02:09,325 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 07:02:13,596 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 07:02:16,612 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.404e+00 2024-08-10 07:02:26,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=15.0 2024-08-10 07:02:26,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14000, loss[loss=0.09973, beats_loss=0.01538, ecapa_loss=0.0002573, whisper_loss=0.08178, over 21315.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01218, ecapa_loss=0.0002921, whisper_loss=0.09914, over 3907785.31 frames. ], batch size: 88, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:02:32,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=429880.0, ans=0.0 2024-08-10 07:02:43,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 3.368e+01 3.902e+01 4.630e+01 2.044e+02, threshold=7.804e+01, percent-clipped=2.0 2024-08-10 07:02:52,695 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 07:03:10,358 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 07:03:35,085 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14050, loss[loss=0.1412, beats_loss=0.01032, ecapa_loss=0.0002815, whisper_loss=0.1281, over 24427.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01216, ecapa_loss=0.00029, whisper_loss=0.09938, over 3899733.69 frames. ], batch size: 94, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:04:11,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=12.0 2024-08-10 07:04:44,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14100, loss[loss=0.08924, beats_loss=0.01521, ecapa_loss=0.0002743, whisper_loss=0.07129, over 16063.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01221, ecapa_loss=0.0002896, whisper_loss=0.09891, over 3878092.75 frames. ], batch size: 66, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:04:48,935 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 07:04:58,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=430980.0, ans=0.0 2024-08-10 07:05:00,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 3.108e+01 3.411e+01 4.014e+01 7.175e+01, threshold=6.821e+01, percent-clipped=1.0 2024-08-10 07:05:00,476 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 07:05:17,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.84 vs. limit=22.5 2024-08-10 07:05:24,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=431180.0, ans=0.125 2024-08-10 07:05:41,689 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 07:05:43,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431280.0, ans=0.125 2024-08-10 07:05:49,724 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 07:05:52,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14150, loss[loss=0.1183, beats_loss=0.008882, ecapa_loss=0.0003374, whisper_loss=0.1061, over 18155.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01234, ecapa_loss=0.0002879, whisper_loss=0.09832, over 3887844.58 frames. ], batch size: 70, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:05:55,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=431380.0, ans=0.125 2024-08-10 07:05:57,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=431380.0, ans=0.125 2024-08-10 07:06:02,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=431380.0, ans=0.125 2024-08-10 07:06:08,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-08-10 07:06:12,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431480.0, ans=0.1 2024-08-10 07:06:18,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:06:41,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2024-08-10 07:06:43,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=431680.0, ans=0.125 2024-08-10 07:06:50,682 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 07:06:57,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=431780.0, ans=0.125 2024-08-10 07:06:58,826 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 07:07:01,268 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14200, loss[loss=0.1038, beats_loss=0.01278, ecapa_loss=0.0002699, whisper_loss=0.0883, over 16200.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01243, ecapa_loss=0.0002861, whisper_loss=0.09744, over 3895668.72 frames. ], batch size: 64, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:07:03,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=431880.0, ans=0.125 2024-08-10 07:07:11,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=431880.0, ans=0.125 2024-08-10 07:07:15,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=431980.0, ans=0.125 2024-08-10 07:07:18,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.227e+01 3.786e+01 4.277e+01 7.139e+01, threshold=7.572e+01, percent-clipped=1.0 2024-08-10 07:07:20,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=12.0 2024-08-10 07:07:31,328 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.959e-01 2024-08-10 07:07:49,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=432180.0, ans=0.0 2024-08-10 07:08:11,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14250, loss[loss=0.09502, beats_loss=0.01464, ecapa_loss=0.0002368, whisper_loss=0.07801, over 20363.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01239, ecapa_loss=0.0002849, whisper_loss=0.09711, over 3900254.30 frames. ], batch size: 82, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:08:15,829 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 07:08:17,213 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 07:08:24,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=432480.0, ans=0.0 2024-08-10 07:08:32,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=432480.0, ans=0.2 2024-08-10 07:08:35,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=432480.0, ans=0.0 2024-08-10 07:08:38,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=432580.0, ans=0.125 2024-08-10 07:08:47,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=432580.0, ans=0.125 2024-08-10 07:08:52,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=432680.0, ans=0.5 2024-08-10 07:09:03,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=432680.0, ans=0.125 2024-08-10 07:09:07,335 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 07:09:15,074 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 07:09:15,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=432780.0, ans=0.0 2024-08-10 07:09:20,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14300, loss[loss=0.1061, beats_loss=0.01184, ecapa_loss=0.0003539, whisper_loss=0.09072, over 20737.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01236, ecapa_loss=0.0002866, whisper_loss=0.09704, over 3884609.32 frames. ], batch size: 89, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:09:21,845 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 07:09:30,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2024-08-10 07:09:36,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 3.216e+01 3.597e+01 4.195e+01 6.015e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:09:46,697 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 07:09:48,428 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:09:49,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=433080.0, ans=0.0 2024-08-10 07:09:51,776 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 07:09:52,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=433080.0, ans=0.0 2024-08-10 07:09:53,338 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 07:09:53,671 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:10:28,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=433380.0, ans=0.125 2024-08-10 07:10:28,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14350, loss[loss=0.1307, beats_loss=0.01088, ecapa_loss=0.000257, whisper_loss=0.1172, over 21254.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01237, ecapa_loss=0.0002851, whisper_loss=0.09683, over 3899908.07 frames. ], batch size: 80, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:10:33,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=433380.0, ans=0.125 2024-08-10 07:10:35,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=433380.0, ans=0.0 2024-08-10 07:10:38,042 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 07:10:42,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=433480.0, ans=0.125 2024-08-10 07:10:43,733 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 07:10:51,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=433480.0, ans=0.0 2024-08-10 07:10:57,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=433580.0, ans=0.0 2024-08-10 07:11:08,744 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 07:11:14,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=433680.0, ans=0.05 2024-08-10 07:11:35,418 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-10 07:11:35,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=433880.0, ans=0.125 2024-08-10 07:11:36,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14400, loss[loss=0.1059, beats_loss=0.01533, ecapa_loss=0.000219, whisper_loss=0.08837, over 22669.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01228, ecapa_loss=0.0002877, whisper_loss=0.09786, over 3929462.09 frames. ], batch size: 89, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:11:45,267 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 07:11:53,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 3.382e+01 3.755e+01 4.286e+01 6.808e+01, threshold=7.511e+01, percent-clipped=0.0 2024-08-10 07:11:53,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=433980.0, ans=0.2 2024-08-10 07:11:59,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=433980.0, ans=0.0 2024-08-10 07:12:33,536 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.406e-02 2024-08-10 07:12:42,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=434280.0, ans=0.2 2024-08-10 07:12:45,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 3, batch 14450, loss[loss=0.1114, beats_loss=0.01178, ecapa_loss=0.0002505, whisper_loss=0.09714, over 18529.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01239, ecapa_loss=0.0002887, whisper_loss=0.09784, over 3965020.58 frames. ], batch size: 71, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:12:49,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=434380.0, ans=0.125 2024-08-10 07:12:51,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434380.0, ans=0.1 2024-08-10 07:13:04,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=434480.0, ans=15.0 2024-08-10 07:13:05,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=434480.0, ans=0.2 2024-08-10 07:13:09,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=434480.0, ans=0.05 2024-08-10 07:13:32,256 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 07:14:14,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 0, loss[loss=0.09189, beats_loss=0.01771, ecapa_loss=0.0002193, whisper_loss=0.07199, over 19687.00 frames. ], tot_loss[loss=0.09189, beats_loss=0.01771, ecapa_loss=0.0002193, whisper_loss=0.07199, over 19687.00 frames. ], batch size: 80, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:14:14,461 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 07:14:55,843 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on ASR_libri: loss=0.268, beats_loss=0, ecapa_loss=0.0008857, whisper_loss=0.2592, over 922467.00 frames. 2024-08-10 07:15:10,851 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on SV_voxceleb1: loss=0.007801, beats_loss=0, ecapa_loss=0.0007801, whisper_loss=0, over 939242.00 frames. 2024-08-10 07:17:09,567 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on AT_audioset: loss=0.02834, beats_loss=0.02834, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 07:17:09,570 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 07:17:13,919 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 07:17:19,053 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 07:17:42,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=434870.0, ans=0.0 2024-08-10 07:18:12,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.318e+01 3.888e+01 4.583e+01 8.270e+01, threshold=7.777e+01, percent-clipped=1.0 2024-08-10 07:18:15,610 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 07:18:21,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=434970.0, ans=0.125 2024-08-10 07:18:30,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=435070.0, ans=0.04949747468305833 2024-08-10 07:18:39,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=435070.0, ans=0.125 2024-08-10 07:19:19,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 50, loss[loss=0.1019, beats_loss=0.01397, ecapa_loss=0.0002668, whisper_loss=0.0853, over 14095.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01246, ecapa_loss=0.0002924, whisper_loss=0.09531, over 869897.43 frames. ], batch size: 55, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:19:45,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=435370.0, ans=0.125 2024-08-10 07:19:50,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=435370.0, ans=0.0 2024-08-10 07:20:03,676 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 07:20:22,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=435470.0, ans=0.0 2024-08-10 07:20:52,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2024-08-10 07:21:21,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 100, loss[loss=0.08672, beats_loss=0.01216, ecapa_loss=0.000272, whisper_loss=0.07184, over 15220.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01229, ecapa_loss=0.0002885, whisper_loss=0.09512, over 1505557.41 frames. ], batch size: 60, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:21:40,572 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 07:21:41,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2024-08-10 07:21:47,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-10 07:21:58,804 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 07:22:04,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=12.0 2024-08-10 07:22:14,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=12.0 2024-08-10 07:22:14,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.841e+01 3.372e+01 3.715e+01 4.340e+01 6.479e+01, threshold=7.429e+01, percent-clipped=0.0 2024-08-10 07:22:26,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=435970.0, ans=15.0 2024-08-10 07:22:49,498 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 42 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 07:22:54,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=436170.0, ans=0.5 2024-08-10 07:23:03,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=436170.0, ans=0.125 2024-08-10 07:23:14,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 150, loss[loss=0.1008, beats_loss=0.01491, ecapa_loss=0.0002378, whisper_loss=0.08352, over 22158.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01216, ecapa_loss=0.0002864, whisper_loss=0.09506, over 2009326.83 frames. ], batch size: 92, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:23:20,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=436270.0, ans=0.125 2024-08-10 07:23:26,684 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 07:23:41,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=436370.0, ans=0.0 2024-08-10 07:23:54,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=436470.0, ans=0.125 2024-08-10 07:24:04,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=436570.0, ans=0.2 2024-08-10 07:24:14,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=436570.0, ans=0.09899494936611666 2024-08-10 07:24:22,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=436670.0, ans=0.125 2024-08-10 07:24:23,755 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 07:24:27,267 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 07:24:29,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=436670.0, ans=0.1 2024-08-10 07:24:30,114 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 07:24:30,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=436670.0, ans=0.025 2024-08-10 07:24:38,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 200, loss[loss=0.1117, beats_loss=0.01215, ecapa_loss=0.0003352, whisper_loss=0.09619, over 18529.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01197, ecapa_loss=0.0002848, whisper_loss=0.09586, over 2392575.10 frames. ], batch size: 78, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:25:04,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=436870.0, ans=10.0 2024-08-10 07:25:12,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2024-08-10 07:25:14,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 3.326e+01 3.682e+01 4.488e+01 7.047e+01, threshold=7.364e+01, percent-clipped=0.0 2024-08-10 07:25:17,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=436970.0, ans=12.0 2024-08-10 07:25:19,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436970.0, ans=0.1 2024-08-10 07:25:20,566 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 07:25:20,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=436970.0, ans=0.125 2024-08-10 07:25:30,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=437070.0, ans=0.125 2024-08-10 07:25:34,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=437070.0, ans=0.125 2024-08-10 07:25:57,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 250, loss[loss=0.1212, beats_loss=0.01328, ecapa_loss=0.0002836, whisper_loss=0.1051, over 18603.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01188, ecapa_loss=0.0002873, whisper_loss=0.0962, over 2689177.15 frames. ], batch size: 74, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:26:07,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437270.0, ans=0.125 2024-08-10 07:26:14,459 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 07:26:16,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-08-10 07:26:19,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=437370.0, ans=0.0 2024-08-10 07:26:22,235 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 07:26:26,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=437470.0, ans=0.125 2024-08-10 07:26:37,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=437470.0, ans=0.125 2024-08-10 07:26:40,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=437470.0, ans=0.0 2024-08-10 07:26:58,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=437670.0, ans=0.0 2024-08-10 07:27:01,071 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 07:27:05,938 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 07:27:06,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2024-08-10 07:27:12,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 300, loss[loss=0.08931, beats_loss=0.01541, ecapa_loss=0.0002308, whisper_loss=0.07159, over 14829.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01183, ecapa_loss=0.000288, whisper_loss=0.09699, over 2934976.86 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:27:22,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=437770.0, ans=0.05 2024-08-10 07:27:23,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2024-08-10 07:27:28,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=437870.0, ans=0.2 2024-08-10 07:27:33,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=22.5 2024-08-10 07:27:41,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=437970.0, ans=0.125 2024-08-10 07:27:45,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=437970.0, ans=0.125 2024-08-10 07:27:46,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=15.0 2024-08-10 07:27:46,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.173e+01 3.597e+01 4.305e+01 6.522e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:27:55,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=437970.0, ans=0.125 2024-08-10 07:28:00,281 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 07:28:10,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=438070.0, ans=0.5 2024-08-10 07:28:11,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=438170.0, ans=0.0 2024-08-10 07:28:11,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=438170.0, ans=0.125 2024-08-10 07:28:14,315 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 07:28:25,988 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 07:28:27,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 350, loss[loss=0.1155, beats_loss=0.01027, ecapa_loss=0.0002973, whisper_loss=0.1022, over 13742.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01195, ecapa_loss=0.0002855, whisper_loss=0.09662, over 3132150.45 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:28:35,652 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 07:29:07,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438470.0, ans=0.1 2024-08-10 07:29:12,650 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 07:29:24,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2024-08-10 07:29:27,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2024-08-10 07:29:28,342 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 07:29:42,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 400, loss[loss=0.09768, beats_loss=0.01349, ecapa_loss=0.0002887, whisper_loss=0.0813, over 22521.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01212, ecapa_loss=0.0002812, whisper_loss=0.09605, over 3292126.61 frames. ], batch size: 93, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:29:48,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2024-08-10 07:29:56,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=438870.0, ans=0.125 2024-08-10 07:30:00,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=438870.0, ans=0.0 2024-08-10 07:30:08,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=438870.0, ans=0.125 2024-08-10 07:30:16,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 3.285e+01 3.710e+01 4.185e+01 8.184e+01, threshold=7.420e+01, percent-clipped=1.0 2024-08-10 07:30:27,736 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-10 07:30:34,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=439070.0, ans=0.0 2024-08-10 07:30:38,023 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 07:30:54,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 450, loss[loss=0.1158, beats_loss=0.01053, ecapa_loss=0.0002475, whisper_loss=0.1028, over 19046.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01209, ecapa_loss=0.0002794, whisper_loss=0.09647, over 3436671.19 frames. ], batch size: 75, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:31:04,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=439270.0, ans=0.125 2024-08-10 07:31:23,138 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 07:31:30,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=439470.0, ans=0.125 2024-08-10 07:31:31,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439470.0, ans=0.1 2024-08-10 07:31:34,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=439570.0, ans=0.125 2024-08-10 07:31:45,318 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 07:31:48,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=439670.0, ans=0.0 2024-08-10 07:31:49,142 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 07:31:49,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=439670.0, ans=0.125 2024-08-10 07:32:00,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 500, loss[loss=0.1317, beats_loss=0.01041, ecapa_loss=0.0002981, whisper_loss=0.1183, over 17697.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01205, ecapa_loss=0.0002759, whisper_loss=0.09702, over 3546609.04 frames. ], batch size: 70, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:32:02,277 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 07:32:09,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-10 07:32:16,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=439870.0, ans=0.125 2024-08-10 07:32:33,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.971e+01 3.310e+01 3.858e+01 7.927e+01, threshold=6.621e+01, percent-clipped=1.0 2024-08-10 07:32:39,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=439970.0, ans=0.125 2024-08-10 07:32:46,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-10 07:33:09,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 550, loss[loss=0.1203, beats_loss=0.008441, ecapa_loss=0.0003364, whisper_loss=0.1085, over 22336.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01204, ecapa_loss=0.0002729, whisper_loss=0.09703, over 3590351.71 frames. ], batch size: 92, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:33:21,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=440370.0, ans=0.0 2024-08-10 07:33:22,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2024-08-10 07:33:23,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=440370.0, ans=0.125 2024-08-10 07:33:26,556 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-10 07:33:40,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2024-08-10 07:33:42,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440470.0, ans=0.1 2024-08-10 07:33:48,024 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 07:33:50,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=440570.0, ans=0.2 2024-08-10 07:33:53,030 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 07:34:00,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=440570.0, ans=0.125 2024-08-10 07:34:00,978 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 07:34:01,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=440670.0, ans=0.125 2024-08-10 07:34:14,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 600, loss[loss=0.1217, beats_loss=0.01025, ecapa_loss=0.0002493, whisper_loss=0.1089, over 21904.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01202, ecapa_loss=0.0002691, whisper_loss=0.09727, over 3663746.07 frames. ], batch size: 84, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:34:15,080 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 07:34:26,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440770.0, ans=0.1 2024-08-10 07:34:37,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=440870.0, ans=0.125 2024-08-10 07:34:45,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.403e+01 3.004e+01 3.329e+01 3.797e+01 6.092e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 07:34:49,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=440970.0, ans=0.125 2024-08-10 07:34:54,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=441070.0, ans=0.1 2024-08-10 07:35:10,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=441170.0, ans=0.0 2024-08-10 07:35:11,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=441170.0, ans=0.125 2024-08-10 07:35:20,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 650, loss[loss=0.1002, beats_loss=0.01379, ecapa_loss=0.0002319, whisper_loss=0.08408, over 22301.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01207, ecapa_loss=0.0002673, whisper_loss=0.09698, over 3718040.13 frames. ], batch size: 89, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:35:32,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=441370.0, ans=0.125 2024-08-10 07:35:43,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.90 vs. limit=22.5 2024-08-10 07:35:44,277 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.820e-02 2024-08-10 07:35:52,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-08-10 07:36:03,074 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.728e-02 2024-08-10 07:36:05,203 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 07:36:05,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=441570.0, ans=0.125 2024-08-10 07:36:16,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.61 vs. limit=22.5 2024-08-10 07:36:20,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441670.0, ans=0.1 2024-08-10 07:36:24,045 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 07:36:26,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 700, loss[loss=0.1233, beats_loss=0.007687, ecapa_loss=0.0002601, whisper_loss=0.113, over 20153.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01207, ecapa_loss=0.0002698, whisper_loss=0.09781, over 3777868.97 frames. ], batch size: 74, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:36:37,307 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 07:36:37,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=441770.0, ans=0.125 2024-08-10 07:36:50,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=441870.0, ans=0.0 2024-08-10 07:36:56,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 3.137e+01 3.551e+01 4.143e+01 1.211e+02, threshold=7.103e+01, percent-clipped=4.0 2024-08-10 07:37:06,129 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 07:37:06,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=442070.0, ans=0.0 2024-08-10 07:37:06,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=442070.0, ans=0.0 2024-08-10 07:37:08,670 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 07:37:32,322 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 750, loss[loss=0.1294, beats_loss=0.01013, ecapa_loss=0.0002419, whisper_loss=0.1169, over 21143.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01213, ecapa_loss=0.0002692, whisper_loss=0.09729, over 3825006.31 frames. ], batch size: 79, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:37:32,477 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 07:37:38,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=442270.0, ans=10.0 2024-08-10 07:37:50,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=442370.0, ans=0.125 2024-08-10 07:37:51,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=442370.0, ans=0.07 2024-08-10 07:37:52,088 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 07:37:54,540 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 07:38:02,425 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 07:38:02,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=442470.0, ans=0.125 2024-08-10 07:38:18,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=442570.0, ans=0.0 2024-08-10 07:38:37,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 800, loss[loss=0.1139, beats_loss=0.01061, ecapa_loss=0.0002768, whisper_loss=0.1006, over 16902.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01206, ecapa_loss=0.0002707, whisper_loss=0.09705, over 3850164.34 frames. ], batch size: 66, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:38:53,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-10 07:39:07,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 2.938e+01 3.331e+01 3.852e+01 7.963e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-10 07:39:14,291 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 07:39:33,946 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 07:39:39,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=443170.0, ans=0.125 2024-08-10 07:39:40,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=12.0 2024-08-10 07:39:41,534 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 07:39:42,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 850, loss[loss=0.1157, beats_loss=0.0124, ecapa_loss=0.0002569, whisper_loss=0.1007, over 20980.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01212, ecapa_loss=0.0002667, whisper_loss=0.09594, over 3842036.68 frames. ], batch size: 80, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:39:45,705 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 07:39:46,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=443270.0, ans=0.125 2024-08-10 07:39:53,972 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 07:39:55,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=443370.0, ans=0.125 2024-08-10 07:40:00,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=443370.0, ans=0.125 2024-08-10 07:40:04,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443370.0, ans=0.125 2024-08-10 07:40:05,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=443370.0, ans=0.125 2024-08-10 07:40:13,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=443470.0, ans=0.0 2024-08-10 07:40:26,788 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 07:40:32,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=443570.0, ans=0.125 2024-08-10 07:40:33,381 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 07:40:33,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443570.0, ans=0.1 2024-08-10 07:40:42,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443670.0, ans=0.1 2024-08-10 07:40:44,732 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 07:40:48,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 900, loss[loss=0.1051, beats_loss=0.01121, ecapa_loss=0.0003162, whisper_loss=0.09073, over 16592.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01202, ecapa_loss=0.0002658, whisper_loss=0.09642, over 3791117.89 frames. ], batch size: 69, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:40:59,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=443770.0, ans=0.015 2024-08-10 07:41:13,419 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 07:41:18,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.112e+01 3.456e+01 3.897e+01 5.995e+01, threshold=6.912e+01, percent-clipped=0.0 2024-08-10 07:41:29,134 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 07:41:29,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=444070.0, ans=0.125 2024-08-10 07:41:53,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 950, loss[loss=0.09343, beats_loss=0.01249, ecapa_loss=0.0003059, whisper_loss=0.07787, over 21527.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01204, ecapa_loss=0.000265, whisper_loss=0.09613, over 3790927.78 frames. ], batch size: 91, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:42:01,953 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 38 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 07:42:03,142 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 07:42:23,775 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 07:42:54,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=444670.0, ans=0.125 2024-08-10 07:42:56,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444670.0, ans=0.1 2024-08-10 07:42:59,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1000, loss[loss=0.1012, beats_loss=0.009779, ecapa_loss=0.0002991, whisper_loss=0.08842, over 13694.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01204, ecapa_loss=0.0002657, whisper_loss=0.09616, over 3781993.73 frames. ], batch size: 54, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:43:16,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.43 vs. limit=15.0 2024-08-10 07:43:28,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=444970.0, ans=0.125 2024-08-10 07:43:29,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.226e+01 3.648e+01 4.312e+01 7.271e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 07:43:34,812 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 07:43:51,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-10 07:43:58,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=445170.0, ans=0.125 2024-08-10 07:44:00,172 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 07:44:04,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1050, loss[loss=0.1105, beats_loss=0.008904, ecapa_loss=0.0002554, whisper_loss=0.09907, over 16376.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01206, ecapa_loss=0.0002664, whisper_loss=0.09642, over 3790205.47 frames. ], batch size: 65, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:44:18,073 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 07:44:19,351 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 07:44:25,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445370.0, ans=0.1 2024-08-10 07:44:30,802 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 07:44:41,310 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 07:44:45,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=445570.0, ans=0.0 2024-08-10 07:44:48,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-10 07:44:50,262 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 26 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 07:44:58,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=445670.0, ans=0.0 2024-08-10 07:45:09,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1100, loss[loss=0.1213, beats_loss=0.012, ecapa_loss=0.0002309, whisper_loss=0.107, over 21150.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01207, ecapa_loss=0.0002651, whisper_loss=0.09656, over 3798553.96 frames. ], batch size: 82, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:45:10,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=445770.0, ans=0.0 2024-08-10 07:45:14,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445770.0, ans=0.1 2024-08-10 07:45:26,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=445870.0, ans=0.0 2024-08-10 07:45:32,036 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=5.035e-03 2024-08-10 07:45:33,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=445870.0, ans=0.125 2024-08-10 07:45:39,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.161e+01 3.477e+01 3.934e+01 8.780e+01, threshold=6.953e+01, percent-clipped=2.0 2024-08-10 07:45:40,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445970.0, ans=0.1 2024-08-10 07:45:42,372 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 07:45:57,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=446070.0, ans=0.125 2024-08-10 07:46:01,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=446170.0, ans=0.125 2024-08-10 07:46:02,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446170.0, ans=0.1 2024-08-10 07:46:14,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1150, loss[loss=0.1067, beats_loss=0.01146, ecapa_loss=0.0003099, whisper_loss=0.09212, over 22033.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01217, ecapa_loss=0.000264, whisper_loss=0.09619, over 3802141.23 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:46:24,095 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 07:46:24,375 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:46:31,625 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 07:46:51,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446470.0, ans=0.1 2024-08-10 07:46:59,256 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 07:47:09,967 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 07:47:15,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=446670.0, ans=0.125 2024-08-10 07:47:20,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1200, loss[loss=0.09267, beats_loss=0.01369, ecapa_loss=0.0002468, whisper_loss=0.07652, over 14129.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01212, ecapa_loss=0.0002638, whisper_loss=0.09623, over 3762138.77 frames. ], batch size: 53, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:47:25,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=446770.0, ans=0.0 2024-08-10 07:47:33,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=446870.0, ans=0.125 2024-08-10 07:47:41,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=446870.0, ans=0.125 2024-08-10 07:47:47,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=446970.0, ans=0.0 2024-08-10 07:47:50,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.044e+01 3.412e+01 3.944e+01 6.015e+01, threshold=6.823e+01, percent-clipped=0.0 2024-08-10 07:48:16,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=447170.0, ans=0.1 2024-08-10 07:48:16,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=447170.0, ans=0.125 2024-08-10 07:48:28,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1250, loss[loss=0.1257, beats_loss=0.01208, ecapa_loss=0.0002267, whisper_loss=0.1113, over 25400.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01208, ecapa_loss=0.0002628, whisper_loss=0.09629, over 3764822.03 frames. ], batch size: 94, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:48:41,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=447370.0, ans=0.0 2024-08-10 07:48:46,049 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.631e+00 2024-08-10 07:48:55,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447470.0, ans=0.125 2024-08-10 07:48:58,582 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 07:48:58,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=447470.0, ans=0.0 2024-08-10 07:49:03,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=447470.0, ans=0.125 2024-08-10 07:49:04,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=447470.0, ans=0.125 2024-08-10 07:49:30,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=447670.0, ans=0.125 2024-08-10 07:49:32,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=447670.0, ans=0.125 2024-08-10 07:49:39,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1300, loss[loss=0.09594, beats_loss=0.01482, ecapa_loss=0.000232, whisper_loss=0.07881, over 19980.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01213, ecapa_loss=0.0002614, whisper_loss=0.09617, over 3792493.02 frames. ], batch size: 79, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:49:43,321 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:49:43,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2024-08-10 07:49:50,164 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 07:49:56,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=447870.0, ans=0.035 2024-08-10 07:50:04,265 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 07:50:11,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2024-08-10 07:50:12,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 3.001e+01 3.337e+01 3.796e+01 6.277e+01, threshold=6.674e+01, percent-clipped=0.0 2024-08-10 07:50:24,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=448070.0, ans=0.1 2024-08-10 07:50:27,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=448070.0, ans=0.125 2024-08-10 07:50:29,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2024-08-10 07:50:32,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=448070.0, ans=0.125 2024-08-10 07:50:45,366 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 07:50:50,227 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 07:50:51,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1350, loss[loss=0.1029, beats_loss=0.01295, ecapa_loss=0.0002333, whisper_loss=0.08764, over 18036.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01207, ecapa_loss=0.0002622, whisper_loss=0.09591, over 3780388.84 frames. ], batch size: 72, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:51:09,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448370.0, ans=0.1 2024-08-10 07:51:18,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=448370.0, ans=0.0 2024-08-10 07:51:22,988 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 07:51:30,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=448470.0, ans=0.0 2024-08-10 07:51:43,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=448570.0, ans=0.125 2024-08-10 07:51:52,656 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.133e-01 2024-08-10 07:51:54,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=448670.0, ans=0.125 2024-08-10 07:52:03,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1400, loss[loss=0.1165, beats_loss=0.01359, ecapa_loss=0.0002885, whisper_loss=0.1, over 18047.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01206, ecapa_loss=0.0002619, whisper_loss=0.0958, over 3765952.57 frames. ], batch size: 71, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:52:04,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=448770.0, ans=0.5 2024-08-10 07:52:14,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=448770.0, ans=0.0 2024-08-10 07:52:30,192 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 07:52:37,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.977e+01 3.358e+01 3.939e+01 6.744e+01, threshold=6.717e+01, percent-clipped=2.0 2024-08-10 07:52:57,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=449070.0, ans=0.125 2024-08-10 07:53:09,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=449170.0, ans=0.02 2024-08-10 07:53:17,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1450, loss[loss=0.123, beats_loss=0.01022, ecapa_loss=0.000205, whisper_loss=0.1107, over 16312.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01205, ecapa_loss=0.0002639, whisper_loss=0.09592, over 3792626.40 frames. ], batch size: 58, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:53:54,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.41 vs. limit=10.0 2024-08-10 07:53:58,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2024-08-10 07:54:02,971 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 07:54:09,191 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-10 07:54:13,936 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 07:54:31,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=449570.0, ans=0.0 2024-08-10 07:54:41,284 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 07:54:43,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449570.0, ans=0.1 2024-08-10 07:54:51,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449670.0, ans=0.1 2024-08-10 07:54:53,236 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 07:55:00,475 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1500, loss[loss=0.1085, beats_loss=0.01461, ecapa_loss=0.0001844, whisper_loss=0.09209, over 19645.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01208, ecapa_loss=0.000261, whisper_loss=0.09528, over 3787310.06 frames. ], batch size: 75, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:55:03,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-10 07:55:30,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=449970.0, ans=0.125 2024-08-10 07:55:30,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-08-10 07:55:35,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.938e+01 3.327e+01 3.975e+01 6.102e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-10 07:55:49,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2024-08-10 07:56:05,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=450170.0, ans=0.125 2024-08-10 07:56:11,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=450170.0, ans=0.0 2024-08-10 07:56:14,063 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 13 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 07:56:16,691 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1550, loss[loss=0.1455, beats_loss=0.01041, ecapa_loss=0.0003302, whisper_loss=0.1318, over 17342.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01208, ecapa_loss=0.0002604, whisper_loss=0.09578, over 3787568.01 frames. ], batch size: 70, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:56:21,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2024-08-10 07:56:21,718 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 07:56:32,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=450370.0, ans=0.125 2024-08-10 07:56:35,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=450370.0, ans=0.125 2024-08-10 07:56:45,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450370.0, ans=0.1 2024-08-10 07:57:05,036 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 07:57:17,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450670.0, ans=0.1 2024-08-10 07:57:20,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2024-08-10 07:57:23,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=450670.0, ans=0.125 2024-08-10 07:57:32,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1600, loss[loss=0.1107, beats_loss=0.009058, ecapa_loss=0.0002992, whisper_loss=0.09867, over 18292.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01203, ecapa_loss=0.0002625, whisper_loss=0.09554, over 3788614.00 frames. ], batch size: 72, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:57:42,936 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 07:57:55,588 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 07:58:00,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=450870.0, ans=0.0 2024-08-10 07:58:03,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=450970.0, ans=0.1 2024-08-10 07:58:07,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 3.094e+01 3.435e+01 3.999e+01 7.884e+01, threshold=6.871e+01, percent-clipped=1.0 2024-08-10 07:58:13,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=450970.0, ans=0.04949747468305833 2024-08-10 07:58:14,582 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 07:58:33,440 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 07:58:46,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1650, loss[loss=0.103, beats_loss=0.01484, ecapa_loss=0.0002711, whisper_loss=0.08546, over 18145.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01205, ecapa_loss=0.0002614, whisper_loss=0.09597, over 3791172.35 frames. ], batch size: 74, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:58:51,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=451270.0, ans=0.0 2024-08-10 07:59:00,013 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 07:59:20,706 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 07:59:25,426 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.256e+00 2024-08-10 07:59:38,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=451570.0, ans=0.0 2024-08-10 07:59:52,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=451670.0, ans=0.125 2024-08-10 07:59:58,960 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1700, loss[loss=0.09127, beats_loss=0.01188, ecapa_loss=0.0002427, whisper_loss=0.07696, over 16991.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01188, ecapa_loss=0.0002634, whisper_loss=0.09668, over 3781508.17 frames. ], batch size: 68, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:00:14,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-08-10 08:00:15,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=451870.0, ans=0.125 2024-08-10 08:00:31,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.130e+01 3.389e+01 3.948e+01 7.641e+01, threshold=6.778e+01, percent-clipped=2.0 2024-08-10 08:00:46,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-10 08:00:57,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452170.0, ans=0.1 2024-08-10 08:01:08,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1750, loss[loss=0.1242, beats_loss=0.01116, ecapa_loss=0.0002232, whisper_loss=0.1108, over 23757.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01193, ecapa_loss=0.0002621, whisper_loss=0.09676, over 3820741.17 frames. ], batch size: 91, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:01:22,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=452370.0, ans=0.0 2024-08-10 08:01:47,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=452470.0, ans=0.2 2024-08-10 08:02:02,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=452570.0, ans=15.0 2024-08-10 08:02:06,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452670.0, ans=0.1 2024-08-10 08:02:11,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=15.0 2024-08-10 08:02:11,597 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 37 from Vox, 25 fro AS 2024-08-10 08:02:13,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=452670.0, ans=0.125 2024-08-10 08:02:18,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1800, loss[loss=0.09874, beats_loss=0.01374, ecapa_loss=0.0002918, whisper_loss=0.08208, over 22009.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01189, ecapa_loss=0.0002623, whisper_loss=0.09684, over 3840542.06 frames. ], batch size: 91, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:02:22,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.33 vs. limit=15.0 2024-08-10 08:02:27,920 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 08:02:38,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=452870.0, ans=0.125 2024-08-10 08:02:41,193 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 08:02:43,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=452970.0, ans=0.125 2024-08-10 08:02:49,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 3.196e+01 3.582e+01 4.110e+01 5.783e+01, threshold=7.164e+01, percent-clipped=0.0 2024-08-10 08:02:58,319 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 08:03:08,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2024-08-10 08:03:14,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453170.0, ans=0.125 2024-08-10 08:03:26,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-10 08:03:26,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1850, loss[loss=0.115, beats_loss=0.01374, ecapa_loss=0.0002497, whisper_loss=0.09878, over 18573.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01189, ecapa_loss=0.0002636, whisper_loss=0.09714, over 3828264.02 frames. ], batch size: 73, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:03:33,991 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 08:03:39,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=453270.0, ans=0.125 2024-08-10 08:03:46,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=453370.0, ans=0.0 2024-08-10 08:03:47,646 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 08:03:51,764 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-10 08:03:52,147 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.717e-01 2024-08-10 08:04:06,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-10 08:04:18,484 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 08:04:28,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453670.0, ans=0.125 2024-08-10 08:04:36,440 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 08:04:38,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=453770.0, ans=0.125 2024-08-10 08:04:39,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1900, loss[loss=0.1045, beats_loss=0.01298, ecapa_loss=0.0002585, whisper_loss=0.08898, over 22155.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01193, ecapa_loss=0.0002657, whisper_loss=0.09645, over 3816306.48 frames. ], batch size: 91, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:04:44,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.00 vs. limit=22.5 2024-08-10 08:04:51,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2024-08-10 08:04:52,144 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-10 08:05:07,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=453970.0, ans=0.2 2024-08-10 08:05:10,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 3.027e+01 3.393e+01 3.845e+01 7.336e+01, threshold=6.786e+01, percent-clipped=1.0 2024-08-10 08:05:27,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=454070.0, ans=0.0 2024-08-10 08:05:28,159 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 08:05:33,451 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 08:05:38,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=454170.0, ans=0.2 2024-08-10 08:05:44,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=454170.0, ans=0.95 2024-08-10 08:05:44,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2024-08-10 08:05:49,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 1950, loss[loss=0.122, beats_loss=0.01154, ecapa_loss=0.0002962, whisper_loss=0.1075, over 22256.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01207, ecapa_loss=0.0002702, whisper_loss=0.09631, over 3821655.75 frames. ], batch size: 90, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:06:00,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2024-08-10 08:06:18,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=454470.0, ans=0.125 2024-08-10 08:06:18,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=454470.0, ans=0.125 2024-08-10 08:06:32,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=454570.0, ans=0.0 2024-08-10 08:06:45,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=454670.0, ans=0.2 2024-08-10 08:07:00,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=454770.0, ans=0.2 2024-08-10 08:07:00,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2000, loss[loss=0.09975, beats_loss=0.01454, ecapa_loss=0.0002441, whisper_loss=0.08278, over 18070.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01214, ecapa_loss=0.0002723, whisper_loss=0.09574, over 3832487.27 frames. ], batch size: 74, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:07:00,857 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 38 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 08:07:23,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2024-08-10 08:07:24,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=454870.0, ans=0.5 2024-08-10 08:07:27,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2024-08-10 08:07:34,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.702e+01 4.234e+01 5.771e+01, threshold=7.405e+01, percent-clipped=0.0 2024-08-10 08:07:59,970 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 08:08:13,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2050, loss[loss=0.08733, beats_loss=0.01686, ecapa_loss=0.0002771, whisper_loss=0.0677, over 14345.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01214, ecapa_loss=0.0002766, whisper_loss=0.09673, over 3875642.33 frames. ], batch size: 62, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:08:17,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455270.0, ans=0.1 2024-08-10 08:08:23,165 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 08:08:30,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.70 vs. limit=10.0 2024-08-10 08:08:41,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=455470.0, ans=0.0 2024-08-10 08:08:45,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455470.0, ans=0.1 2024-08-10 08:08:48,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=455470.0, ans=0.125 2024-08-10 08:08:52,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=455470.0, ans=0.02 2024-08-10 08:08:59,168 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 08:09:20,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=455670.0, ans=0.0 2024-08-10 08:09:24,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2100, loss[loss=0.09932, beats_loss=0.01251, ecapa_loss=0.0003085, whisper_loss=0.08372, over 17527.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01237, ecapa_loss=0.0002735, whisper_loss=0.09542, over 3864530.97 frames. ], batch size: 70, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:09:27,828 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.858e+05 2024-08-10 08:09:31,436 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 08:09:41,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2024-08-10 08:09:53,275 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 08:09:54,753 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-10 08:09:56,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=455970.0, ans=0.0 2024-08-10 08:09:57,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.943e+01 3.340e+01 3.951e+01 7.714e+01, threshold=6.679e+01, percent-clipped=1.0 2024-08-10 08:10:07,770 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 08:10:12,740 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 08:10:16,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=456070.0, ans=0.125 2024-08-10 08:10:17,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=456070.0, ans=0.125 2024-08-10 08:10:31,716 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 08:10:36,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2150, loss[loss=0.08627, beats_loss=0.01537, ecapa_loss=0.0002523, whisper_loss=0.06838, over 16856.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01242, ecapa_loss=0.0002748, whisper_loss=0.09517, over 3855309.96 frames. ], batch size: 67, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:10:47,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=456270.0, ans=0.125 2024-08-10 08:11:03,778 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 08:11:13,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=456470.0, ans=0.125 2024-08-10 08:11:51,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2200, loss[loss=0.1282, beats_loss=0.01265, ecapa_loss=0.0002434, whisper_loss=0.1131, over 15909.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01234, ecapa_loss=0.0002758, whisper_loss=0.09506, over 3831042.85 frames. ], batch size: 62, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:11:58,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2024-08-10 08:12:07,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=456870.0, ans=0.0 2024-08-10 08:12:20,455 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 08:12:26,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.107e+01 3.618e+01 4.202e+01 6.900e+01, threshold=7.235e+01, percent-clipped=1.0 2024-08-10 08:12:40,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=457070.0, ans=0.125 2024-08-10 08:12:45,851 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 08:12:51,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457170.0, ans=0.1 2024-08-10 08:13:00,108 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 08:13:05,242 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2250, loss[loss=0.1088, beats_loss=0.01354, ecapa_loss=0.0003058, whisper_loss=0.09219, over 21644.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01236, ecapa_loss=0.0002759, whisper_loss=0.09562, over 3826352.11 frames. ], batch size: 88, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:13:19,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-10 08:13:29,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=457370.0, ans=0.2 2024-08-10 08:13:31,800 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:13:34,298 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-10 08:13:48,544 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 08:13:49,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=457470.0, ans=22.5 2024-08-10 08:13:51,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=30.75 vs. limit=15.0 2024-08-10 08:13:52,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=457570.0, ans=0.125 2024-08-10 08:14:13,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=15.0 2024-08-10 08:14:21,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2300, loss[loss=0.08045, beats_loss=0.0156, ecapa_loss=0.0002545, whisper_loss=0.0623, over 18096.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01237, ecapa_loss=0.0002771, whisper_loss=0.0952, over 3843100.40 frames. ], batch size: 78, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:14:56,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 3.052e+01 3.526e+01 3.987e+01 6.394e+01, threshold=7.053e+01, percent-clipped=0.0 2024-08-10 08:14:58,175 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 08:15:37,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2350, loss[loss=0.08442, beats_loss=0.01377, ecapa_loss=0.000286, whisper_loss=0.06778, over 14751.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01232, ecapa_loss=0.0002745, whisper_loss=0.09543, over 3832638.06 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:15:46,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=458270.0, ans=0.125 2024-08-10 08:15:47,145 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 08:15:58,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=458370.0, ans=0.2 2024-08-10 08:16:00,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=458370.0, ans=0.125 2024-08-10 08:16:01,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-10 08:16:02,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-08-10 08:16:17,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458470.0, ans=0.125 2024-08-10 08:16:21,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=458470.0, ans=0.0 2024-08-10 08:16:22,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=458470.0, ans=0.125 2024-08-10 08:16:25,053 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 08:16:29,700 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 08:16:30,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=458570.0, ans=0.0 2024-08-10 08:16:32,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-08-10 08:16:44,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458670.0, ans=0.1 2024-08-10 08:16:51,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2024-08-10 08:16:56,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2024-08-10 08:16:56,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2400, loss[loss=0.09232, beats_loss=0.01479, ecapa_loss=0.000237, whisper_loss=0.07516, over 20297.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01221, ecapa_loss=0.0002762, whisper_loss=0.09593, over 3813163.08 frames. ], batch size: 83, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:16:57,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2024-08-10 08:17:07,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-08-10 08:17:29,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.868e+01 3.229e+01 3.686e+01 5.514e+01, threshold=6.458e+01, percent-clipped=0.0 2024-08-10 08:17:48,495 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 08:18:18,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2450, loss[loss=0.1105, beats_loss=0.01332, ecapa_loss=0.000242, whisper_loss=0.09471, over 22568.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01223, ecapa_loss=0.0002753, whisper_loss=0.09582, over 3824333.14 frames. ], batch size: 91, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:18:19,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=459270.0, ans=0.0 2024-08-10 08:18:19,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.25 vs. limit=22.5 2024-08-10 08:18:23,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=459270.0, ans=0.0 2024-08-10 08:18:27,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459270.0, ans=0.125 2024-08-10 08:18:39,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=459370.0, ans=0.2 2024-08-10 08:18:44,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=459370.0, ans=0.125 2024-08-10 08:18:55,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=459470.0, ans=0.0 2024-08-10 08:19:04,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=459570.0, ans=0.125 2024-08-10 08:19:07,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=459570.0, ans=0.0 2024-08-10 08:19:41,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2500, loss[loss=0.1098, beats_loss=0.01317, ecapa_loss=0.0002614, whisper_loss=0.09405, over 22540.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01224, ecapa_loss=0.0002761, whisper_loss=0.09615, over 3857913.88 frames. ], batch size: 91, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:20:31,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.999e+01 3.542e+01 3.925e+01 6.520e+01, threshold=7.085e+01, percent-clipped=1.0 2024-08-10 08:20:56,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=460070.0, ans=0.2 2024-08-10 08:21:09,427 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 08:21:14,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=460170.0, ans=0.125 2024-08-10 08:21:16,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=460170.0, ans=0.125 2024-08-10 08:21:24,269 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 08:21:25,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2550, loss[loss=0.09879, beats_loss=0.01421, ecapa_loss=0.0003094, whisper_loss=0.08149, over 21133.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01212, ecapa_loss=0.0002776, whisper_loss=0.0968, over 3879432.07 frames. ], batch size: 91, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:21:28,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=460270.0, ans=0.0 2024-08-10 08:21:42,622 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 08:21:42,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460370.0, ans=0.1 2024-08-10 08:22:14,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=460470.0, ans=0.0 2024-08-10 08:22:33,674 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 08:22:41,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=460570.0, ans=0.125 2024-08-10 08:22:41,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=460570.0, ans=0.0 2024-08-10 08:22:43,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=460570.0, ans=0.125 2024-08-10 08:22:48,002 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 08:22:59,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2024-08-10 08:23:08,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2600, loss[loss=0.09579, beats_loss=0.015, ecapa_loss=0.0002046, whisper_loss=0.07875, over 20582.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01217, ecapa_loss=0.0002756, whisper_loss=0.09706, over 3874724.38 frames. ], batch size: 81, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:23:22,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=460770.0, ans=0.0 2024-08-10 08:23:33,511 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 08:23:40,989 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 08:23:50,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=460870.0, ans=0.125 2024-08-10 08:24:01,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 3.079e+01 3.425e+01 3.855e+01 5.495e+01, threshold=6.850e+01, percent-clipped=0.0 2024-08-10 08:24:30,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=461070.0, ans=0.125 2024-08-10 08:24:38,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=461170.0, ans=0.125 2024-08-10 08:24:58,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=461170.0, ans=0.0 2024-08-10 08:25:03,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2650, loss[loss=0.09404, beats_loss=0.01337, ecapa_loss=0.0003143, whisper_loss=0.07753, over 20738.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.0122, ecapa_loss=0.0002739, whisper_loss=0.09658, over 3908949.98 frames. ], batch size: 88, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:25:09,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461270.0, ans=0.1 2024-08-10 08:25:11,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=461270.0, ans=0.125 2024-08-10 08:25:23,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=461270.0, ans=0.0 2024-08-10 08:25:35,731 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 08:25:44,938 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 11 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 08:26:21,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=461570.0, ans=0.2 2024-08-10 08:26:40,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=461670.0, ans=0.0 2024-08-10 08:26:51,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=461670.0, ans=0.2 2024-08-10 08:26:57,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2700, loss[loss=0.1005, beats_loss=0.01432, ecapa_loss=0.0002147, whisper_loss=0.084, over 21238.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01223, ecapa_loss=0.000277, whisper_loss=0.09599, over 3889336.32 frames. ], batch size: 84, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:27:07,203 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 08:27:38,067 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 08:27:38,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=461870.0, ans=0.125 2024-08-10 08:27:45,087 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 08:27:48,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+01 3.222e+01 3.601e+01 4.234e+01 3.838e+02, threshold=7.201e+01, percent-clipped=7.0 2024-08-10 08:27:55,703 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 08:27:59,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=462070.0, ans=0.125 2024-08-10 08:28:30,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2750, loss[loss=0.1217, beats_loss=0.01296, ecapa_loss=0.0002814, whisper_loss=0.1059, over 22784.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01226, ecapa_loss=0.0002778, whisper_loss=0.09642, over 3864056.71 frames. ], batch size: 92, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:28:40,635 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 08:28:47,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462370.0, ans=0.125 2024-08-10 08:28:52,679 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 08:28:55,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=462370.0, ans=0.125 2024-08-10 08:29:01,488 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 08:29:09,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=462470.0, ans=0.0 2024-08-10 08:29:19,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=462570.0, ans=0.0 2024-08-10 08:29:24,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=462570.0, ans=0.025 2024-08-10 08:29:45,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2800, loss[loss=0.1322, beats_loss=0.01011, ecapa_loss=0.0003294, whisper_loss=0.1188, over 19361.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01229, ecapa_loss=0.0002766, whisper_loss=0.09647, over 3868862.59 frames. ], batch size: 79, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:29:48,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=462770.0, ans=0.125 2024-08-10 08:29:49,188 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 08:29:50,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.98 vs. limit=10.0 2024-08-10 08:30:14,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=462970.0, ans=0.125 2024-08-10 08:30:19,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.197e+01 3.685e+01 4.218e+01 5.823e+01, threshold=7.371e+01, percent-clipped=0.0 2024-08-10 08:30:31,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=463070.0, ans=0.0 2024-08-10 08:30:46,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463170.0, ans=0.1 2024-08-10 08:30:51,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=463170.0, ans=0.1 2024-08-10 08:30:56,638 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 08:31:00,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=463270.0, ans=0.0 2024-08-10 08:31:01,041 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2850, loss[loss=0.1116, beats_loss=0.012, ecapa_loss=0.0003384, whisper_loss=0.09626, over 17429.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01228, ecapa_loss=0.0002754, whisper_loss=0.0963, over 3873679.18 frames. ], batch size: 72, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:31:11,980 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-10 08:31:16,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=463370.0, ans=0.125 2024-08-10 08:31:18,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2024-08-10 08:31:37,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=463470.0, ans=0.2 2024-08-10 08:31:51,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463570.0, ans=0.1 2024-08-10 08:31:53,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=463570.0, ans=0.0 2024-08-10 08:32:14,815 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 08:32:24,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2900, loss[loss=0.1362, beats_loss=0.01006, ecapa_loss=0.0003381, whisper_loss=0.1228, over 15545.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01241, ecapa_loss=0.0002745, whisper_loss=0.09611, over 3914384.20 frames. ], batch size: 66, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:32:33,799 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 08:32:47,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-08-10 08:33:05,064 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:33:05,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.004e+01 3.404e+01 3.788e+01 1.422e+02, threshold=6.807e+01, percent-clipped=1.0 2024-08-10 08:33:13,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=463970.0, ans=0.125 2024-08-10 08:33:16,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-10 08:33:36,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=464170.0, ans=0.125 2024-08-10 08:33:38,491 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-10 08:33:42,246 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 08:33:53,299 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 08:33:54,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=464270.0, ans=0.0 2024-08-10 08:33:55,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 2950, loss[loss=0.07744, beats_loss=0.01467, ecapa_loss=0.000289, whisper_loss=0.05988, over 22165.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01229, ecapa_loss=0.0002757, whisper_loss=0.0958, over 3891836.41 frames. ], batch size: 95, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:34:00,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.47 vs. limit=22.5 2024-08-10 08:34:04,086 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 08:34:04,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=464270.0, ans=0.125 2024-08-10 08:34:06,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2024-08-10 08:34:21,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.63 vs. limit=10.0 2024-08-10 08:34:28,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-10 08:34:32,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=464370.0, ans=0.2 2024-08-10 08:34:42,360 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 08:34:48,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-10 08:35:22,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=464670.0, ans=0.2 2024-08-10 08:35:27,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3000, loss[loss=0.08566, beats_loss=0.01383, ecapa_loss=0.0002895, whisper_loss=0.06893, over 13140.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01232, ecapa_loss=0.0002754, whisper_loss=0.09605, over 3913453.15 frames. ], batch size: 54, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:35:27,859 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 08:35:53,650 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0074, 0.0207, 0.0027, 1.1000, 0.0112, 0.0280, 0.0280, 0.0246], device='cuda:3') 2024-08-10 08:36:05,698 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on ASR_libri: loss=0.2648, beats_loss=0, ecapa_loss=0.0008316, whisper_loss=0.2565, over 922467.00 frames. 2024-08-10 08:36:23,237 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on SV_voxceleb1: loss=0.007277, beats_loss=0, ecapa_loss=0.0007277, whisper_loss=0, over 939242.00 frames. 2024-08-10 08:38:19,709 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on AT_audioset: loss=0.0279, beats_loss=0.0279, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 08:38:19,712 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 08:38:23,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=12.0 2024-08-10 08:38:28,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=464770.0, ans=0.125 2024-08-10 08:38:37,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464870.0, ans=0.1 2024-08-10 08:38:39,551 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 08:38:57,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 3.167e+01 3.615e+01 4.298e+01 8.066e+01, threshold=7.230e+01, percent-clipped=1.0 2024-08-10 08:39:03,889 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-10 08:39:17,134 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-10 08:39:29,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2024-08-10 08:39:33,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=465170.0, ans=0.0 2024-08-10 08:39:34,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-08-10 08:39:34,913 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-10 08:39:40,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3050, loss[loss=0.1116, beats_loss=0.01144, ecapa_loss=0.0003106, whisper_loss=0.09703, over 21167.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01232, ecapa_loss=0.0002767, whisper_loss=0.09632, over 3910779.15 frames. ], batch size: 84, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:39:43,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=465270.0, ans=0.0 2024-08-10 08:39:46,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465270.0, ans=0.1 2024-08-10 08:39:49,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=465270.0, ans=0.0 2024-08-10 08:39:52,405 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 08:40:24,840 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.066e-03 2024-08-10 08:40:39,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=465570.0, ans=0.0 2024-08-10 08:40:48,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=465670.0, ans=0.2 2024-08-10 08:41:01,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.95 vs. limit=10.0 2024-08-10 08:41:03,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3100, loss[loss=0.1123, beats_loss=0.01201, ecapa_loss=0.0002859, whisper_loss=0.09747, over 22814.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01229, ecapa_loss=0.0002796, whisper_loss=0.09691, over 3904462.55 frames. ], batch size: 93, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:41:22,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=465870.0, ans=0.125 2024-08-10 08:41:32,296 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 08:41:40,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465970.0, ans=0.125 2024-08-10 08:41:43,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.398e+01 3.878e+01 4.582e+01 1.719e+02, threshold=7.756e+01, percent-clipped=2.0 2024-08-10 08:41:52,675 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 08:41:58,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=466070.0, ans=0.5 2024-08-10 08:42:13,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.57 vs. limit=10.0 2024-08-10 08:42:33,420 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3150, loss[loss=0.1118, beats_loss=0.01138, ecapa_loss=0.0002761, whisper_loss=0.09766, over 15591.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01212, ecapa_loss=0.0002799, whisper_loss=0.0976, over 3890992.61 frames. ], batch size: 61, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:42:54,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=466370.0, ans=0.0 2024-08-10 08:43:09,042 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 28 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 08:43:20,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=466470.0, ans=0.0 2024-08-10 08:43:22,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466470.0, ans=0.1 2024-08-10 08:43:26,300 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 08:43:34,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=466570.0, ans=15.0 2024-08-10 08:43:46,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-10 08:43:49,289 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 08:43:58,711 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3200, loss[loss=0.09774, beats_loss=0.01129, ecapa_loss=0.0003805, whisper_loss=0.08264, over 20068.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01208, ecapa_loss=0.0002803, whisper_loss=0.09794, over 3886273.63 frames. ], batch size: 89, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:44:07,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2024-08-10 08:44:08,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=466770.0, ans=0.0 2024-08-10 08:44:08,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=466770.0, ans=0.125 2024-08-10 08:44:09,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2024-08-10 08:44:17,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466870.0, ans=0.1 2024-08-10 08:44:25,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=466870.0, ans=0.125 2024-08-10 08:44:40,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.101e+01 3.705e+01 4.309e+01 1.166e+02, threshold=7.411e+01, percent-clipped=1.0 2024-08-10 08:44:49,100 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 08:44:51,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466970.0, ans=0.1 2024-08-10 08:44:52,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466970.0, ans=0.0 2024-08-10 08:45:02,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467070.0, ans=0.1 2024-08-10 08:45:12,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=467070.0, ans=0.1 2024-08-10 08:45:19,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=467170.0, ans=0.125 2024-08-10 08:45:32,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3250, loss[loss=0.1202, beats_loss=0.01377, ecapa_loss=0.0002496, whisper_loss=0.104, over 22790.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01216, ecapa_loss=0.0002783, whisper_loss=0.09819, over 3886520.35 frames. ], batch size: 90, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:45:43,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=467270.0, ans=0.125 2024-08-10 08:46:34,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-08-10 08:46:36,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=467570.0, ans=0.125 2024-08-10 08:46:47,380 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 08:46:47,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.97 vs. limit=22.5 2024-08-10 08:46:53,462 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 08:46:59,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-08-10 08:47:04,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3300, loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.0003018, whisper_loss=0.09053, over 15818.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01223, ecapa_loss=0.0002772, whisper_loss=0.09743, over 3867686.14 frames. ], batch size: 60, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:47:13,492 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 08:47:14,602 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 08:47:40,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467970.0, ans=0.1 2024-08-10 08:47:46,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=467970.0, ans=0.125 2024-08-10 08:47:46,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 3.041e+01 3.344e+01 3.812e+01 6.169e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 08:47:52,601 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 35 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 08:48:03,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=468070.0, ans=0.125 2024-08-10 08:48:20,491 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 08:48:24,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=468170.0, ans=0.125 2024-08-10 08:48:34,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3350, loss[loss=0.1167, beats_loss=0.009484, ecapa_loss=0.0003336, whisper_loss=0.1039, over 20201.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01211, ecapa_loss=0.0002777, whisper_loss=0.09769, over 3868433.76 frames. ], batch size: 83, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:48:47,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468270.0, ans=0.1 2024-08-10 08:48:53,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468370.0, ans=0.1 2024-08-10 08:49:17,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=468470.0, ans=0.2 2024-08-10 08:49:39,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=468570.0, ans=0.125 2024-08-10 08:49:41,324 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 08:49:49,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=468670.0, ans=0.125 2024-08-10 08:49:58,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3400, loss[loss=0.09464, beats_loss=0.01324, ecapa_loss=0.0003333, whisper_loss=0.07807, over 19956.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01216, ecapa_loss=0.0002767, whisper_loss=0.0971, over 3869968.21 frames. ], batch size: 87, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:50:34,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.156e+01 3.587e+01 4.181e+01 1.855e+02, threshold=7.174e+01, percent-clipped=2.0 2024-08-10 08:50:39,158 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 08:50:41,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=468970.0, ans=0.125 2024-08-10 08:50:47,122 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 08:50:53,480 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 08:50:53,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=469070.0, ans=0.2 2024-08-10 08:51:01,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-10 08:51:11,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-08-10 08:51:11,982 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 08:51:15,951 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3450, loss[loss=0.09219, beats_loss=0.01295, ecapa_loss=0.000277, whisper_loss=0.07646, over 19584.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01217, ecapa_loss=0.0002776, whisper_loss=0.09651, over 3860716.78 frames. ], batch size: 79, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:51:29,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=12.0 2024-08-10 08:51:31,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469370.0, ans=0.1 2024-08-10 08:52:04,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=469570.0, ans=0.0 2024-08-10 08:52:25,642 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 17 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 08:52:29,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3500, loss[loss=0.1437, beats_loss=0.01088, ecapa_loss=0.0002563, whisper_loss=0.1303, over 18526.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01223, ecapa_loss=0.0002778, whisper_loss=0.09668, over 3861495.76 frames. ], batch size: 68, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:52:32,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2024-08-10 08:52:42,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469770.0, ans=0.1 2024-08-10 08:52:47,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-08-10 08:53:04,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 3.037e+01 3.390e+01 3.981e+01 6.541e+01, threshold=6.780e+01, percent-clipped=0.0 2024-08-10 08:53:09,229 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 08:53:09,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=469970.0, ans=0.125 2024-08-10 08:53:15,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=470070.0, ans=0.125 2024-08-10 08:53:44,582 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3550, loss[loss=0.1071, beats_loss=0.01351, ecapa_loss=0.000229, whisper_loss=0.0913, over 14015.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01219, ecapa_loss=0.000278, whisper_loss=0.09638, over 3856811.09 frames. ], batch size: 54, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:53:57,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=470370.0, ans=0.0 2024-08-10 08:54:00,431 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 08:54:06,657 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 08:54:12,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=470470.0, ans=0.2 2024-08-10 08:54:25,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=470470.0, ans=0.125 2024-08-10 08:54:35,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=470570.0, ans=0.5 2024-08-10 08:54:37,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.46 vs. limit=22.5 2024-08-10 08:54:38,847 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.554e+00 2024-08-10 08:54:43,025 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 08:54:57,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3600, loss[loss=0.1046, beats_loss=0.01583, ecapa_loss=0.0001867, whisper_loss=0.08694, over 17251.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01219, ecapa_loss=0.0002768, whisper_loss=0.09654, over 3866952.88 frames. ], batch size: 66, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:55:10,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=470870.0, ans=0.125 2024-08-10 08:55:15,994 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:55:21,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=470870.0, ans=0.125 2024-08-10 08:55:22,767 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 08:55:24,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=470970.0, ans=0.5 2024-08-10 08:55:28,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470970.0, ans=0.1 2024-08-10 08:55:29,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.993e+01 3.332e+01 3.946e+01 5.463e+01, threshold=6.665e+01, percent-clipped=0.0 2024-08-10 08:55:38,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=470970.0, ans=0.125 2024-08-10 08:55:46,085 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 08:56:10,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3650, loss[loss=0.1108, beats_loss=0.01756, ecapa_loss=0.0002016, whisper_loss=0.09127, over 18457.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01225, ecapa_loss=0.0002762, whisper_loss=0.09585, over 3854957.96 frames. ], batch size: 71, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:56:13,709 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 08:56:32,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2024-08-10 08:56:41,420 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 08:56:45,388 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 08:56:51,104 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 08:57:14,992 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 08:57:16,273 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 08:57:20,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3700, loss[loss=0.105, beats_loss=0.01241, ecapa_loss=0.0002447, whisper_loss=0.09019, over 19702.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01212, ecapa_loss=0.0002785, whisper_loss=0.09661, over 3846378.23 frames. ], batch size: 78, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:57:27,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471770.0, ans=0.1 2024-08-10 08:57:31,214 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 08:57:38,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=471870.0, ans=0.125 2024-08-10 08:57:47,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-10 08:57:51,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=471970.0, ans=0.0 2024-08-10 08:57:51,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 3.070e+01 3.607e+01 4.290e+01 1.526e+02, threshold=7.214e+01, percent-clipped=4.0 2024-08-10 08:57:57,280 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 08:58:14,200 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 08:58:16,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2024-08-10 08:58:22,153 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-10 08:58:27,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3750, loss[loss=0.09404, beats_loss=0.01506, ecapa_loss=0.000271, whisper_loss=0.07627, over 20827.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01225, ecapa_loss=0.0002801, whisper_loss=0.09589, over 3868391.67 frames. ], batch size: 89, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:58:30,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=472270.0, ans=10.0 2024-08-10 08:58:35,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-10 08:58:36,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=472270.0, ans=0.0 2024-08-10 08:58:51,461 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 08:59:02,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=472470.0, ans=0.125 2024-08-10 08:59:19,123 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 08:59:39,912 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 08:59:40,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3800, loss[loss=0.1303, beats_loss=0.009289, ecapa_loss=0.0002974, whisper_loss=0.1181, over 23893.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01214, ecapa_loss=0.000282, whisper_loss=0.09772, over 3894066.87 frames. ], batch size: 94, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:59:41,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=472770.0, ans=0.125 2024-08-10 08:59:46,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=472770.0, ans=0.0 2024-08-10 08:59:53,537 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 29 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-10 08:59:54,747 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 09:00:05,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-08-10 09:00:13,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.142e+01 3.387e+01 4.333e+01 6.143e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-10 09:00:52,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3850, loss[loss=0.1058, beats_loss=0.01232, ecapa_loss=0.0003045, whisper_loss=0.09041, over 20874.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01215, ecapa_loss=0.0002795, whisper_loss=0.09725, over 3868525.31 frames. ], batch size: 89, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:00:55,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=473270.0, ans=0.125 2024-08-10 09:00:57,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473270.0, ans=0.1 2024-08-10 09:01:07,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=473370.0, ans=0.0 2024-08-10 09:01:26,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2024-08-10 09:01:29,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=22.5 2024-08-10 09:01:42,504 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 09:01:44,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=473570.0, ans=0.125 2024-08-10 09:01:44,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=473570.0, ans=0.125 2024-08-10 09:01:59,636 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 09:02:04,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3900, loss[loss=0.09492, beats_loss=0.01517, ecapa_loss=0.0002196, whisper_loss=0.07755, over 22299.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01216, ecapa_loss=0.0002787, whisper_loss=0.09722, over 3859708.61 frames. ], batch size: 90, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:02:22,428 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 09:02:31,936 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-10 09:02:37,675 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 09:02:38,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+01 3.153e+01 3.691e+01 4.376e+01 6.503e+01, threshold=7.382e+01, percent-clipped=0.0 2024-08-10 09:02:39,379 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 09:02:42,001 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 09:03:07,965 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 09:03:11,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=474170.0, ans=0.125 2024-08-10 09:03:15,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=474170.0, ans=0.125 2024-08-10 09:03:17,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 3950, loss[loss=0.1175, beats_loss=0.01132, ecapa_loss=0.0003102, whisper_loss=0.1031, over 21241.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01206, ecapa_loss=0.0002828, whisper_loss=0.09805, over 3867609.95 frames. ], batch size: 88, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:03:17,948 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 09:03:36,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-10 09:03:54,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=474470.0, ans=0.125 2024-08-10 09:04:02,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=474570.0, ans=0.05 2024-08-10 09:04:04,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=474570.0, ans=0.125 2024-08-10 09:04:06,737 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 09:04:20,513 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 09:04:22,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.71 vs. limit=15.0 2024-08-10 09:04:28,843 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4000, loss[loss=0.1313, beats_loss=0.009294, ecapa_loss=0.0003087, whisper_loss=0.1189, over 20859.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01197, ecapa_loss=0.0002832, whisper_loss=0.09848, over 3885391.21 frames. ], batch size: 80, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:04:29,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=474770.0, ans=0.125 2024-08-10 09:04:39,135 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 09:04:39,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=474770.0, ans=0.2 2024-08-10 09:04:50,671 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 09:04:52,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=474870.0, ans=0.0 2024-08-10 09:05:02,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 3.204e+01 3.613e+01 4.111e+01 7.755e+01, threshold=7.226e+01, percent-clipped=1.0 2024-08-10 09:05:15,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2024-08-10 09:05:27,458 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 09:05:38,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2024-08-10 09:05:43,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4050, loss[loss=0.1565, beats_loss=0.008708, ecapa_loss=0.0002885, whisper_loss=0.1449, over 21639.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01191, ecapa_loss=0.0002842, whisper_loss=0.09909, over 3905286.81 frames. ], batch size: 81, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:06:21,709 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 09:06:31,414 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 09:06:40,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=475570.0, ans=0.125 2024-08-10 09:06:47,195 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 09:06:57,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4100, loss[loss=0.1132, beats_loss=0.01329, ecapa_loss=0.0002499, whisper_loss=0.09745, over 23354.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01189, ecapa_loss=0.0002819, whisper_loss=0.09927, over 3911188.89 frames. ], batch size: 93, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:07:06,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=475770.0, ans=0.125 2024-08-10 09:07:11,018 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 09:07:15,359 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 09:07:20,521 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.518e-03 2024-08-10 09:07:30,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=475970.0, ans=0.1 2024-08-10 09:07:34,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 3.046e+01 3.447e+01 3.852e+01 5.765e+01, threshold=6.895e+01, percent-clipped=0.0 2024-08-10 09:07:36,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475970.0, ans=0.04949747468305833 2024-08-10 09:07:47,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=476070.0, ans=0.025 2024-08-10 09:07:59,074 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 09:08:08,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=476170.0, ans=0.0 2024-08-10 09:08:14,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=476270.0, ans=0.125 2024-08-10 09:08:15,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4150, loss[loss=0.1305, beats_loss=0.009956, ecapa_loss=0.0002915, whisper_loss=0.1176, over 19729.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.0119, ecapa_loss=0.0002808, whisper_loss=0.099, over 3898063.29 frames. ], batch size: 76, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:08:23,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=476270.0, ans=0.0 2024-08-10 09:08:51,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-08-10 09:09:00,764 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-10 09:09:07,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-08-10 09:09:10,496 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 09:09:38,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4200, loss[loss=0.1118, beats_loss=0.01182, ecapa_loss=0.0002717, whisper_loss=0.09728, over 19063.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.012, ecapa_loss=0.0002805, whisper_loss=0.09805, over 3889732.59 frames. ], batch size: 72, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:09:46,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=476770.0, ans=0.125 2024-08-10 09:09:46,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=476770.0, ans=0.0 2024-08-10 09:09:54,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=476870.0, ans=0.125 2024-08-10 09:10:06,025 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 09:10:08,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=476970.0, ans=0.0 2024-08-10 09:10:12,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 3.164e+01 3.633e+01 4.360e+01 6.348e+01, threshold=7.265e+01, percent-clipped=0.0 2024-08-10 09:10:34,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=477070.0, ans=0.125 2024-08-10 09:10:43,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=477170.0, ans=0.125 2024-08-10 09:10:54,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=477170.0, ans=0.1 2024-08-10 09:10:56,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4250, loss[loss=0.1101, beats_loss=0.01417, ecapa_loss=0.0002526, whisper_loss=0.09342, over 22224.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01209, ecapa_loss=0.0002778, whisper_loss=0.09732, over 3885974.14 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:10:57,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477270.0, ans=0.125 2024-08-10 09:11:01,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-08-10 09:11:09,720 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 09:11:15,376 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 09:11:17,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-10 09:11:26,737 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 09:11:41,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=477470.0, ans=10.0 2024-08-10 09:11:59,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=477570.0, ans=0.125 2024-08-10 09:12:16,169 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4300, loss[loss=0.1122, beats_loss=0.01167, ecapa_loss=0.0002859, whisper_loss=0.09772, over 23690.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01202, ecapa_loss=0.0002766, whisper_loss=0.09742, over 3868752.94 frames. ], batch size: 95, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:12:24,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=477770.0, ans=0.035 2024-08-10 09:12:54,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.896e+01 3.194e+01 3.711e+01 5.609e+01, threshold=6.388e+01, percent-clipped=0.0 2024-08-10 09:13:09,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-10 09:13:10,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=478070.0, ans=0.125 2024-08-10 09:13:28,311 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 09:13:29,927 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 09:13:34,370 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4350, loss[loss=0.1174, beats_loss=0.01269, ecapa_loss=0.0003289, whisper_loss=0.1014, over 17034.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01206, ecapa_loss=0.0002773, whisper_loss=0.09748, over 3862427.70 frames. ], batch size: 71, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:13:54,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=478370.0, ans=0.0 2024-08-10 09:14:07,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=478470.0, ans=0.0 2024-08-10 09:14:20,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=478570.0, ans=0.2 2024-08-10 09:14:32,231 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 09:14:45,585 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 09:14:47,304 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 09:14:50,454 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 09:14:50,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478770.0, ans=0.1 2024-08-10 09:14:51,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4400, loss[loss=0.1256, beats_loss=0.01202, ecapa_loss=0.0003142, whisper_loss=0.1105, over 21468.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01219, ecapa_loss=0.0002754, whisper_loss=0.09678, over 3893692.61 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:14:56,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=478770.0, ans=0.0 2024-08-10 09:15:04,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=478770.0, ans=0.125 2024-08-10 09:15:11,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=478870.0, ans=0.125 2024-08-10 09:15:24,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=478970.0, ans=0.5 2024-08-10 09:15:25,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+01 3.041e+01 3.447e+01 3.976e+01 9.860e+01, threshold=6.894e+01, percent-clipped=1.0 2024-08-10 09:15:28,382 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 13 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 09:15:29,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=478970.0, ans=0.125 2024-08-10 09:15:43,000 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 09:16:04,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4450, loss[loss=0.1093, beats_loss=0.01005, ecapa_loss=0.0003893, whisper_loss=0.09539, over 22090.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01223, ecapa_loss=0.0002769, whisper_loss=0.09656, over 3913111.78 frames. ], batch size: 94, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:16:09,936 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 09:16:16,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=479370.0, ans=0.0 2024-08-10 09:16:20,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=479370.0, ans=0.2 2024-08-10 09:16:20,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-10 09:16:26,649 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 09:16:29,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=479470.0, ans=0.0 2024-08-10 09:17:04,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=479670.0, ans=0.2 2024-08-10 09:17:05,565 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 11 from Vox, 47 fro AS 2024-08-10 09:17:11,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4500, loss[loss=0.1184, beats_loss=0.0114, ecapa_loss=0.0002494, whisper_loss=0.1045, over 22648.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01232, ecapa_loss=0.0002763, whisper_loss=0.09613, over 3903846.91 frames. ], batch size: 89, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:17:13,532 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 09:17:13,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479770.0, ans=0.1 2024-08-10 09:17:18,361 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 09:17:24,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=479870.0, ans=0.0 2024-08-10 09:17:35,181 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 09:17:44,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.468e+01 3.224e+01 3.675e+01 4.252e+01 6.669e+01, threshold=7.350e+01, percent-clipped=1.0 2024-08-10 09:18:00,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=480070.0, ans=0.0 2024-08-10 09:18:20,176 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4550, loss[loss=0.1341, beats_loss=0.01061, ecapa_loss=0.0002927, whisper_loss=0.1205, over 24108.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01225, ecapa_loss=0.0002781, whisper_loss=0.09691, over 3907826.53 frames. ], batch size: 91, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:18:20,544 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 09:18:28,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=480270.0, ans=0.2 2024-08-10 09:18:29,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480270.0, ans=0.1 2024-08-10 09:18:29,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.56 vs. limit=15.0 2024-08-10 09:18:39,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=480370.0, ans=0.125 2024-08-10 09:18:43,411 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-10 09:18:45,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2024-08-10 09:18:53,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480470.0, ans=0.1 2024-08-10 09:19:10,940 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 09:19:27,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4600, loss[loss=0.1026, beats_loss=0.01249, ecapa_loss=0.000313, whisper_loss=0.08693, over 19911.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01222, ecapa_loss=0.000278, whisper_loss=0.09663, over 3887802.69 frames. ], batch size: 86, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:19:30,286 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 09:19:39,196 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 20 from LS+wenet, 30 from Vox, 46 fro AS 2024-08-10 09:19:52,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.35 vs. limit=22.5 2024-08-10 09:19:56,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.144e+01 3.622e+01 4.296e+01 6.398e+01, threshold=7.244e+01, percent-clipped=0.0 2024-08-10 09:20:08,483 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 09:20:11,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=481070.0, ans=0.125 2024-08-10 09:20:20,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=481170.0, ans=0.125 2024-08-10 09:20:26,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481170.0, ans=0.1 2024-08-10 09:20:32,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4650, loss[loss=0.1245, beats_loss=0.0101, ecapa_loss=0.0002912, whisper_loss=0.1115, over 18862.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01215, ecapa_loss=0.0002793, whisper_loss=0.09704, over 3912080.17 frames. ], batch size: 75, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:20:35,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481270.0, ans=0.1 2024-08-10 09:20:39,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481270.0, ans=0.125 2024-08-10 09:20:46,000 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 09:20:53,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=481370.0, ans=0.0 2024-08-10 09:21:01,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=481470.0, ans=0.05 2024-08-10 09:21:06,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481470.0, ans=0.1 2024-08-10 09:21:11,915 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 09:21:14,341 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 09:21:23,009 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 09:21:26,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=481670.0, ans=0.0 2024-08-10 09:21:29,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.17 vs. limit=15.0 2024-08-10 09:21:31,162 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 09:21:37,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4700, loss[loss=0.1004, beats_loss=0.01285, ecapa_loss=0.0002836, whisper_loss=0.08474, over 18122.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01215, ecapa_loss=0.0002779, whisper_loss=0.09727, over 3900652.03 frames. ], batch size: 75, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:22:07,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 3.095e+01 3.461e+01 3.864e+01 6.358e+01, threshold=6.922e+01, percent-clipped=0.0 2024-08-10 09:22:20,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=482070.0, ans=0.125 2024-08-10 09:22:23,752 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 9 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 09:22:25,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=482070.0, ans=0.125 2024-08-10 09:22:28,318 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-10 09:22:35,897 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 09:22:42,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4750, loss[loss=0.09418, beats_loss=0.01354, ecapa_loss=0.0002396, whisper_loss=0.07825, over 23480.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01223, ecapa_loss=0.0002776, whisper_loss=0.09602, over 3876922.46 frames. ], batch size: 93, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:22:42,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482270.0, ans=0.1 2024-08-10 09:22:55,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=482370.0, ans=0.0 2024-08-10 09:23:19,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2024-08-10 09:23:27,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482570.0, ans=0.1 2024-08-10 09:23:35,879 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 09:23:45,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4800, loss[loss=0.1005, beats_loss=0.0139, ecapa_loss=0.000246, whisper_loss=0.08412, over 20362.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01225, ecapa_loss=0.0002787, whisper_loss=0.09621, over 3893009.60 frames. ], batch size: 79, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:23:49,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=482770.0, ans=0.0 2024-08-10 09:23:53,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482770.0, ans=0.1 2024-08-10 09:23:56,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=482770.0, ans=0.1 2024-08-10 09:24:00,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=482870.0, ans=0.125 2024-08-10 09:24:09,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=482870.0, ans=0.125 2024-08-10 09:24:14,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=482970.0, ans=0.2 2024-08-10 09:24:14,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=482970.0, ans=0.2 2024-08-10 09:24:14,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+01 3.085e+01 3.419e+01 4.209e+01 9.011e+01, threshold=6.838e+01, percent-clipped=2.0 2024-08-10 09:24:24,157 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-10 09:24:25,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=483070.0, ans=0.2 2024-08-10 09:24:27,726 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 09:24:30,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483070.0, ans=0.1 2024-08-10 09:24:35,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=483170.0, ans=0.125 2024-08-10 09:24:38,718 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 09:24:47,975 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 09:24:49,341 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4850, loss[loss=0.1246, beats_loss=0.01092, ecapa_loss=0.0002714, whisper_loss=0.111, over 21952.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01231, ecapa_loss=0.000278, whisper_loss=0.09642, over 3884741.01 frames. ], batch size: 87, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:24:55,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2024-08-10 09:25:18,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=483470.0, ans=0.125 2024-08-10 09:25:41,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=483570.0, ans=0.125 2024-08-10 09:25:45,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=483570.0, ans=0.04949747468305833 2024-08-10 09:25:49,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2024-08-10 09:25:53,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=483670.0, ans=0.0 2024-08-10 09:25:57,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=483670.0, ans=0.125 2024-08-10 09:26:02,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4900, loss[loss=0.1122, beats_loss=0.009988, ecapa_loss=0.0002599, whisper_loss=0.0996, over 19552.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.0122, ecapa_loss=0.0002767, whisper_loss=0.09659, over 3889863.99 frames. ], batch size: 76, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:26:05,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=483770.0, ans=0.125 2024-08-10 09:26:10,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=483770.0, ans=0.07 2024-08-10 09:26:17,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=483870.0, ans=0.0 2024-08-10 09:26:17,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=483870.0, ans=0.1 2024-08-10 09:26:34,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=483970.0, ans=0.125 2024-08-10 09:26:39,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 3.191e+01 3.639e+01 4.118e+01 6.849e+01, threshold=7.278e+01, percent-clipped=1.0 2024-08-10 09:26:53,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-10 09:26:56,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=484070.0, ans=0.125 2024-08-10 09:27:01,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=484070.0, ans=0.0 2024-08-10 09:27:09,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=484170.0, ans=0.0 2024-08-10 09:27:16,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-10 09:27:29,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 4950, loss[loss=0.103, beats_loss=0.01416, ecapa_loss=0.0002742, whisper_loss=0.08607, over 13875.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01222, ecapa_loss=0.0002775, whisper_loss=0.09592, over 3863074.65 frames. ], batch size: 56, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:27:41,382 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.021e-03 2024-08-10 09:28:05,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=484370.0, ans=0.125 2024-08-10 09:28:10,357 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-10 09:28:14,608 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 09:28:53,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=484670.0, ans=0.125 2024-08-10 09:29:06,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5000, loss[loss=0.1163, beats_loss=0.01424, ecapa_loss=0.0001856, whisper_loss=0.1002, over 23749.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01214, ecapa_loss=0.0002783, whisper_loss=0.09695, over 3866762.20 frames. ], batch size: 89, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:29:15,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-08-10 09:29:41,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-10 09:29:52,087 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+01 3.034e+01 3.424e+01 4.085e+01 5.403e+01, threshold=6.848e+01, percent-clipped=0.0 2024-08-10 09:29:59,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.72 vs. limit=10.0 2024-08-10 09:30:44,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5050, loss[loss=0.1261, beats_loss=0.01171, ecapa_loss=0.0002563, whisper_loss=0.1118, over 23983.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01217, ecapa_loss=0.0002772, whisper_loss=0.09745, over 3886342.63 frames. ], batch size: 93, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:31:25,584 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 09:31:41,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.04 vs. limit=15.0 2024-08-10 09:31:46,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=485570.0, ans=0.125 2024-08-10 09:31:53,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=485570.0, ans=0.125 2024-08-10 09:31:55,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2024-08-10 09:32:06,230 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 09:32:16,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5100, loss[loss=0.1339, beats_loss=0.01067, ecapa_loss=0.0003035, whisper_loss=0.1202, over 22521.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01219, ecapa_loss=0.0002772, whisper_loss=0.09734, over 3868037.90 frames. ], batch size: 92, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:32:19,054 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 09:32:32,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-10 09:32:36,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=485870.0, ans=0.0 2024-08-10 09:32:40,518 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 09:32:45,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.495e+01 3.245e+01 3.767e+01 4.403e+01 1.091e+02, threshold=7.533e+01, percent-clipped=4.0 2024-08-10 09:32:51,020 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 09:33:00,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2024-08-10 09:33:19,197 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 09:33:20,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5150, loss[loss=0.08892, beats_loss=0.01398, ecapa_loss=0.0002721, whisper_loss=0.07222, over 13765.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.0122, ecapa_loss=0.0002751, whisper_loss=0.09713, over 3871700.55 frames. ], batch size: 55, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:33:29,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=486270.0, ans=0.125 2024-08-10 09:33:39,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-10 09:33:47,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=486470.0, ans=0.125 2024-08-10 09:33:50,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-08-10 09:34:05,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=486570.0, ans=0.125 2024-08-10 09:34:07,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=15.0 2024-08-10 09:34:19,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=486670.0, ans=0.125 2024-08-10 09:34:22,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5200, loss[loss=0.1093, beats_loss=0.01269, ecapa_loss=0.0003123, whisper_loss=0.09352, over 14425.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01214, ecapa_loss=0.0002746, whisper_loss=0.09728, over 3847490.46 frames. ], batch size: 58, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:34:26,620 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 09:34:34,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=486870.0, ans=0.2 2024-08-10 09:34:37,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-10 09:34:42,095 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 09:34:43,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=486870.0, ans=0.0 2024-08-10 09:34:47,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=486970.0, ans=0.07 2024-08-10 09:34:51,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 3.041e+01 3.408e+01 4.043e+01 9.843e+01, threshold=6.816e+01, percent-clipped=1.0 2024-08-10 09:35:05,952 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 09:35:16,552 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 09:35:20,040 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 09:35:25,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5250, loss[loss=0.1318, beats_loss=0.01043, ecapa_loss=0.0002361, whisper_loss=0.119, over 18606.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01224, ecapa_loss=0.0002745, whisper_loss=0.09677, over 3846961.98 frames. ], batch size: 68, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:35:34,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=487270.0, ans=0.125 2024-08-10 09:35:43,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=487370.0, ans=0.0 2024-08-10 09:36:05,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487570.0, ans=0.1 2024-08-10 09:36:15,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=487570.0, ans=0.2 2024-08-10 09:36:23,232 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 09:36:29,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5300, loss[loss=0.09948, beats_loss=0.01242, ecapa_loss=0.000277, whisper_loss=0.08429, over 16437.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01213, ecapa_loss=0.0002727, whisper_loss=0.09712, over 3854019.22 frames. ], batch size: 66, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:36:33,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=487770.0, ans=0.0 2024-08-10 09:36:34,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=487770.0, ans=0.125 2024-08-10 09:36:35,050 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-10 09:36:36,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-08-10 09:36:38,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=487770.0, ans=0.2 2024-08-10 09:36:42,246 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 09:36:58,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 3.142e+01 3.530e+01 4.338e+01 6.802e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-10 09:37:03,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487970.0, ans=0.1 2024-08-10 09:37:23,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2024-08-10 09:37:29,455 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-10 09:37:33,258 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5350, loss[loss=0.1204, beats_loss=0.009396, ecapa_loss=0.0002608, whisper_loss=0.1084, over 19020.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01216, ecapa_loss=0.0002704, whisper_loss=0.09722, over 3855373.18 frames. ], batch size: 73, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:37:35,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=488270.0, ans=0.09899494936611666 2024-08-10 09:38:15,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2024-08-10 09:38:22,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=488570.0, ans=15.0 2024-08-10 09:38:36,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5400, loss[loss=0.1089, beats_loss=0.009383, ecapa_loss=0.0003805, whisper_loss=0.09574, over 17978.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.0121, ecapa_loss=0.0002707, whisper_loss=0.09734, over 3849459.47 frames. ], batch size: 76, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:38:47,853 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 09:38:54,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=488870.0, ans=0.0 2024-08-10 09:39:02,848 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 09:39:04,381 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 09:39:05,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.881e+01 3.134e+01 3.602e+01 5.252e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-10 09:39:06,909 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 09:39:28,615 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 09:39:32,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=489170.0, ans=0.125 2024-08-10 09:39:33,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=489170.0, ans=0.125 2024-08-10 09:39:39,757 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5450, loss[loss=0.108, beats_loss=0.01332, ecapa_loss=0.0002832, whisper_loss=0.09187, over 19075.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01208, ecapa_loss=0.000273, whisper_loss=0.09724, over 3830567.14 frames. ], batch size: 75, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:39:40,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=489270.0, ans=0.2 2024-08-10 09:40:23,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-10 09:40:25,678 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 09:40:28,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=489570.0, ans=0.0 2024-08-10 09:40:33,466 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 09:40:34,949 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 09:40:41,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=489670.0, ans=0.125 2024-08-10 09:40:43,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5500, loss[loss=0.08599, beats_loss=0.01678, ecapa_loss=0.0001814, whisper_loss=0.0674, over 15354.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01216, ecapa_loss=0.0002714, whisper_loss=0.09679, over 3838575.67 frames. ], batch size: 59, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:40:43,842 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 13 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 09:40:50,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=489770.0, ans=0.0 2024-08-10 09:40:56,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489870.0, ans=0.1 2024-08-10 09:40:57,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=489870.0, ans=0.5 2024-08-10 09:41:02,496 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 09:41:12,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 3.173e+01 3.591e+01 4.081e+01 1.350e+02, threshold=7.183e+01, percent-clipped=1.0 2024-08-10 09:41:14,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=489970.0, ans=0.125 2024-08-10 09:41:28,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=490070.0, ans=0.0 2024-08-10 09:41:31,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=490070.0, ans=0.125 2024-08-10 09:41:36,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2024-08-10 09:41:37,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=12.0 2024-08-10 09:41:47,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5550, loss[loss=0.1115, beats_loss=0.01383, ecapa_loss=0.0002749, whisper_loss=0.0949, over 20660.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01221, ecapa_loss=0.0002753, whisper_loss=0.09625, over 3876336.53 frames. ], batch size: 88, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:41:47,934 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 09:41:48,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=490270.0, ans=0.07 2024-08-10 09:41:50,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=490270.0, ans=0.0 2024-08-10 09:41:53,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=490270.0, ans=0.035 2024-08-10 09:42:03,587 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 09:42:06,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=490370.0, ans=15.0 2024-08-10 09:42:15,917 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 09:42:19,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.64 vs. limit=15.0 2024-08-10 09:42:24,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=490470.0, ans=0.0 2024-08-10 09:42:26,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=490570.0, ans=0.125 2024-08-10 09:42:30,093 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-10 09:42:38,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=22.5 2024-08-10 09:42:40,364 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 09:42:51,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5600, loss[loss=0.105, beats_loss=0.0146, ecapa_loss=0.0002719, whisper_loss=0.0877, over 21309.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01228, ecapa_loss=0.0002726, whisper_loss=0.09634, over 3885032.08 frames. ], batch size: 89, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:42:52,440 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 09:43:13,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2024-08-10 09:43:19,480 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-10 09:43:20,400 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 3.015e+01 3.404e+01 4.297e+01 6.726e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-10 09:43:30,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491070.0, ans=0.1 2024-08-10 09:43:41,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=491070.0, ans=0.125 2024-08-10 09:43:48,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=491170.0, ans=0.0 2024-08-10 09:43:52,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=491170.0, ans=0.2 2024-08-10 09:43:55,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5650, loss[loss=0.1144, beats_loss=0.01193, ecapa_loss=0.0002865, whisper_loss=0.0996, over 14862.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01238, ecapa_loss=0.0002703, whisper_loss=0.0957, over 3889820.52 frames. ], batch size: 60, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:44:11,371 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 10 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 09:44:18,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=491370.0, ans=0.2 2024-08-10 09:44:26,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=491470.0, ans=0.0 2024-08-10 09:44:27,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=491470.0, ans=0.125 2024-08-10 09:44:34,015 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 09:44:47,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-08-10 09:44:59,325 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5700, loss[loss=0.09672, beats_loss=0.01301, ecapa_loss=0.0002464, whisper_loss=0.08125, over 18959.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01237, ecapa_loss=0.0002712, whisper_loss=0.09565, over 3878701.67 frames. ], batch size: 73, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:45:02,855 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 09:45:11,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-10 09:45:13,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491870.0, ans=0.1 2024-08-10 09:45:19,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491870.0, ans=0.1 2024-08-10 09:45:20,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=491870.0, ans=0.0 2024-08-10 09:45:33,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.076e+01 3.438e+01 4.149e+01 8.224e+01, threshold=6.876e+01, percent-clipped=3.0 2024-08-10 09:45:47,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=492070.0, ans=0.125 2024-08-10 09:45:48,856 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 09:45:51,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=492070.0, ans=0.0 2024-08-10 09:45:52,647 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-10 09:46:07,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=492170.0, ans=0.125 2024-08-10 09:46:08,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-10 09:46:10,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=492170.0, ans=0.0 2024-08-10 09:46:14,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5750, loss[loss=0.1068, beats_loss=0.01083, ecapa_loss=0.0003012, whisper_loss=0.09297, over 16618.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01246, ecapa_loss=0.0002704, whisper_loss=0.09556, over 3919600.82 frames. ], batch size: 66, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:46:24,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.77 vs. limit=22.5 2024-08-10 09:46:26,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=492270.0, ans=0.125 2024-08-10 09:46:44,697 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 09:47:09,768 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 09:47:11,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-10 09:47:15,123 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 09:47:26,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=492670.0, ans=0.0 2024-08-10 09:47:37,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=492770.0, ans=0.0 2024-08-10 09:47:37,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5800, loss[loss=0.1239, beats_loss=0.01259, ecapa_loss=0.0002083, whisper_loss=0.1092, over 24229.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01242, ecapa_loss=0.0002701, whisper_loss=0.09565, over 3884682.77 frames. ], batch size: 92, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:47:49,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-10 09:48:01,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492870.0, ans=0.1 2024-08-10 09:48:08,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=492970.0, ans=0.125 2024-08-10 09:48:08,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-10 09:48:11,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 3.153e+01 3.469e+01 4.030e+01 1.339e+02, threshold=6.938e+01, percent-clipped=1.0 2024-08-10 09:48:17,558 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-10 09:48:17,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492970.0, ans=0.1 2024-08-10 09:48:24,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=493070.0, ans=0.125 2024-08-10 09:48:44,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=493170.0, ans=0.1 2024-08-10 09:48:47,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5850, loss[loss=0.1208, beats_loss=0.01261, ecapa_loss=0.0002213, whisper_loss=0.1059, over 14557.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01242, ecapa_loss=0.000271, whisper_loss=0.09585, over 3901121.30 frames. ], batch size: 55, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:49:04,624 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 09:49:06,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=493370.0, ans=0.0 2024-08-10 09:49:08,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=493370.0, ans=0.125 2024-08-10 09:49:09,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493370.0, ans=0.1 2024-08-10 09:49:25,141 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 09:49:26,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2024-08-10 09:49:27,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=493570.0, ans=0.5 2024-08-10 09:49:37,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=493670.0, ans=0.125 2024-08-10 09:49:45,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-10 09:49:47,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=493670.0, ans=0.0 2024-08-10 09:49:49,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-08-10 09:49:51,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5900, loss[loss=0.0911, beats_loss=0.01636, ecapa_loss=0.0002487, whisper_loss=0.07225, over 15717.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01252, ecapa_loss=0.0002712, whisper_loss=0.09484, over 3886970.64 frames. ], batch size: 68, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:49:56,562 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 09:50:04,223 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-10 09:50:04,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2024-08-10 09:50:05,327 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 09:50:11,844 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 09:50:13,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=493870.0, ans=0.0 2024-08-10 09:50:20,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+01 2.982e+01 3.256e+01 3.844e+01 1.503e+02, threshold=6.513e+01, percent-clipped=1.0 2024-08-10 09:50:27,300 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-10 09:50:29,877 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 09:50:32,318 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 09:50:33,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=494070.0, ans=0.09899494936611666 2024-08-10 09:50:34,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2024-08-10 09:50:34,795 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-10 09:50:38,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=494070.0, ans=0.0 2024-08-10 09:50:41,283 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 09:50:52,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=494170.0, ans=0.09899494936611666 2024-08-10 09:50:54,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 5950, loss[loss=0.1264, beats_loss=0.007873, ecapa_loss=0.0003119, whisper_loss=0.1154, over 16080.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01247, ecapa_loss=0.0002717, whisper_loss=0.09458, over 3856083.24 frames. ], batch size: 61, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:50:55,941 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 09:51:00,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=494270.0, ans=0.0 2024-08-10 09:51:00,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=494270.0, ans=0.125 2024-08-10 09:51:05,220 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 09:51:14,025 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 09:51:14,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.14 vs. limit=10.0 2024-08-10 09:51:17,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=494370.0, ans=0.125 2024-08-10 09:51:24,582 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 09:51:26,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=494470.0, ans=0.0 2024-08-10 09:51:52,292 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 09:51:58,751 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6000, loss[loss=0.07761, beats_loss=0.0162, ecapa_loss=0.0002514, whisper_loss=0.05889, over 21892.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01235, ecapa_loss=0.0002704, whisper_loss=0.09509, over 3865920.81 frames. ], batch size: 94, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:51:58,751 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 09:52:40,025 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on ASR_libri: loss=0.2669, beats_loss=0, ecapa_loss=0.0008114, whisper_loss=0.2588, over 922467.00 frames. 2024-08-10 09:52:55,577 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on SV_voxceleb1: loss=0.00707, beats_loss=0, ecapa_loss=0.000707, whisper_loss=0, over 939242.00 frames. 2024-08-10 09:53:45,199 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2607, 3.2612, 3.7767, 3.2944], device='cuda:3') 2024-08-10 09:54:53,726 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on AT_audioset: loss=0.028, beats_loss=0.028, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 09:54:53,730 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 09:55:16,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=494870.0, ans=0.0 2024-08-10 09:55:20,303 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 09:55:22,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=494970.0, ans=0.1 2024-08-10 09:55:23,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.994e+01 3.624e+01 4.180e+01 6.998e+01, threshold=7.249e+01, percent-clipped=2.0 2024-08-10 09:55:31,227 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 09:55:54,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=495170.0, ans=0.125 2024-08-10 09:55:58,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6050, loss[loss=0.1255, beats_loss=0.01207, ecapa_loss=0.0002604, whisper_loss=0.1108, over 22467.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0122, ecapa_loss=0.0002715, whisper_loss=0.0956, over 3831188.12 frames. ], batch size: 87, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:56:27,897 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 09:56:28,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=495470.0, ans=0.125 2024-08-10 09:56:32,752 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 09:56:34,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=495470.0, ans=0.125 2024-08-10 09:57:02,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6100, loss[loss=0.1023, beats_loss=0.01124, ecapa_loss=0.0002676, whisper_loss=0.08839, over 19643.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01216, ecapa_loss=0.0002722, whisper_loss=0.09598, over 3852299.15 frames. ], batch size: 77, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:57:08,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=495770.0, ans=0.04949747468305833 2024-08-10 09:57:31,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.767e+01 3.161e+01 3.682e+01 7.056e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 09:57:46,184 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 30 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-10 09:57:59,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=496170.0, ans=0.125 2024-08-10 09:58:05,739 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 09:58:06,852 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6150, loss[loss=0.1228, beats_loss=0.01045, ecapa_loss=0.0002537, whisper_loss=0.1098, over 23193.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01218, ecapa_loss=0.0002718, whisper_loss=0.09643, over 3865945.54 frames. ], batch size: 93, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:58:08,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2024-08-10 09:58:16,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=496270.0, ans=0.125 2024-08-10 09:58:22,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496370.0, ans=0.0 2024-08-10 09:59:10,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6200, loss[loss=0.132, beats_loss=0.009626, ecapa_loss=0.0002925, whisper_loss=0.1194, over 14772.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01224, ecapa_loss=0.0002711, whisper_loss=0.09601, over 3856586.36 frames. ], batch size: 57, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 09:59:11,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2024-08-10 09:59:19,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496770.0, ans=0.1 2024-08-10 09:59:20,250 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 09:59:24,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=496870.0, ans=0.2 2024-08-10 09:59:35,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496870.0, ans=0.0 2024-08-10 09:59:40,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 3.143e+01 3.568e+01 4.018e+01 6.093e+01, threshold=7.137e+01, percent-clipped=0.0 2024-08-10 09:59:55,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=497070.0, ans=0.2 2024-08-10 09:59:56,694 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 10:00:04,895 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 10:00:05,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=497170.0, ans=0.0 2024-08-10 10:00:06,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=497170.0, ans=0.125 2024-08-10 10:00:15,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.52 vs. limit=22.5 2024-08-10 10:00:16,269 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6250, loss[loss=0.119, beats_loss=0.0112, ecapa_loss=0.0002335, whisper_loss=0.1054, over 20992.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01213, ecapa_loss=0.0002711, whisper_loss=0.0966, over 3897927.54 frames. ], batch size: 81, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:00:26,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=497270.0, ans=0.125 2024-08-10 10:00:35,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=497370.0, ans=0.125 2024-08-10 10:00:37,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=497370.0, ans=0.0 2024-08-10 10:00:37,912 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 10:00:38,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=497370.0, ans=0.2 2024-08-10 10:00:40,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=497470.0, ans=0.2 2024-08-10 10:00:41,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-10 10:01:09,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=497670.0, ans=10.0 2024-08-10 10:01:10,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=497670.0, ans=0.125 2024-08-10 10:01:20,911 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6300, loss[loss=0.1068, beats_loss=0.01542, ecapa_loss=0.0002302, whisper_loss=0.08907, over 20723.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01212, ecapa_loss=0.0002705, whisper_loss=0.0969, over 3902020.10 frames. ], batch size: 79, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:01:25,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=12.0 2024-08-10 10:01:27,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=497770.0, ans=0.2 2024-08-10 10:01:27,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-08-10 10:01:43,106 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 10:01:45,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497970.0, ans=0.1 2024-08-10 10:01:50,442 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+01 3.096e+01 3.544e+01 4.139e+01 6.723e+01, threshold=7.089e+01, percent-clipped=0.0 2024-08-10 10:01:53,521 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 10:01:59,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.25 vs. limit=15.0 2024-08-10 10:02:10,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=498070.0, ans=0.125 2024-08-10 10:02:25,829 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6350, loss[loss=0.1232, beats_loss=0.00918, ecapa_loss=0.000337, whisper_loss=0.1107, over 18398.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01215, ecapa_loss=0.0002702, whisper_loss=0.09685, over 3900399.35 frames. ], batch size: 74, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:02:37,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=498370.0, ans=0.125 2024-08-10 10:03:02,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-10 10:03:09,603 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-10 10:03:10,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2024-08-10 10:03:11,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=498570.0, ans=0.07 2024-08-10 10:03:25,187 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 10:03:26,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-10 10:03:28,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.05 vs. limit=22.5 2024-08-10 10:03:29,902 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6400, loss[loss=0.1222, beats_loss=0.01147, ecapa_loss=0.000232, whisper_loss=0.1084, over 20876.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01209, ecapa_loss=0.000271, whisper_loss=0.0969, over 3874241.10 frames. ], batch size: 81, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:03:36,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=498770.0, ans=0.0 2024-08-10 10:03:41,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=498770.0, ans=0.125 2024-08-10 10:03:42,964 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 10:03:44,271 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 10:03:47,004 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 10:03:51,053 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 29 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 10:03:58,217 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-10 10:04:01,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.035e+01 3.531e+01 4.097e+01 5.944e+01, threshold=7.062e+01, percent-clipped=0.0 2024-08-10 10:04:03,928 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 10:04:05,921 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 10:04:10,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2024-08-10 10:04:13,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=499070.0, ans=0.0 2024-08-10 10:04:16,173 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 10:04:16,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=499070.0, ans=0.09899494936611666 2024-08-10 10:04:19,817 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.952e+05 2024-08-10 10:04:26,923 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 10:04:30,011 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 10:04:43,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6450, loss[loss=0.07377, beats_loss=0.01569, ecapa_loss=0.0002284, whisper_loss=0.0558, over 13442.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01218, ecapa_loss=0.000268, whisper_loss=0.09698, over 3887071.67 frames. ], batch size: 54, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:04:53,366 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 10:04:53,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=499270.0, ans=0.125 2024-08-10 10:05:08,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=499370.0, ans=0.125 2024-08-10 10:05:11,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=499370.0, ans=0.125 2024-08-10 10:05:14,008 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 10:05:20,349 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 10:05:29,060 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 29 from Vox, 15 fro AS 2024-08-10 10:05:30,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-10 10:05:31,567 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 10:05:36,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=499570.0, ans=0.0 2024-08-10 10:05:52,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=499670.0, ans=0.2 2024-08-10 10:05:52,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=32.38 vs. limit=22.5 2024-08-10 10:05:55,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=499670.0, ans=0.125 2024-08-10 10:05:58,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6500, loss[loss=0.09346, beats_loss=0.01452, ecapa_loss=0.0002872, whisper_loss=0.07607, over 21868.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01217, ecapa_loss=0.0002694, whisper_loss=0.09652, over 3907508.56 frames. ], batch size: 92, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:06:09,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-08-10 10:06:28,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=499970.0, ans=10.0 2024-08-10 10:06:33,929 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.134e+01 3.492e+01 3.881e+01 6.321e+01, threshold=6.984e+01, percent-clipped=0.0 2024-08-10 10:07:04,207 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 10:07:15,588 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6550, loss[loss=0.1167, beats_loss=0.01149, ecapa_loss=0.0003079, whisper_loss=0.1021, over 21302.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01227, ecapa_loss=0.0002685, whisper_loss=0.09591, over 3930504.14 frames. ], batch size: 90, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:07:26,521 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 10:07:54,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=500470.0, ans=0.0 2024-08-10 10:08:01,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=500470.0, ans=0.0 2024-08-10 10:08:02,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=500470.0, ans=0.125 2024-08-10 10:08:04,732 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 10:08:10,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=500570.0, ans=0.2 2024-08-10 10:08:11,975 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 10:08:18,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-08-10 10:08:41,458 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6600, loss[loss=0.1096, beats_loss=0.012, ecapa_loss=0.0003225, whisper_loss=0.09434, over 21533.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01227, ecapa_loss=0.0002687, whisper_loss=0.09676, over 3962622.91 frames. ], batch size: 89, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:09:00,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=500870.0, ans=0.0 2024-08-10 10:09:18,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 3.113e+01 3.580e+01 3.995e+01 6.180e+01, threshold=7.160e+01, percent-clipped=0.0 2024-08-10 10:09:32,094 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-10 10:10:00,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6650, loss[loss=0.1054, beats_loss=0.0148, ecapa_loss=0.0002487, whisper_loss=0.0881, over 20627.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01228, ecapa_loss=0.0002697, whisper_loss=0.09635, over 3969439.66 frames. ], batch size: 83, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:10:13,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=501270.0, ans=0.035 2024-08-10 10:10:15,315 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 10:10:22,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501370.0, ans=0.1 2024-08-10 10:10:27,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=501370.0, ans=0.2 2024-08-10 10:10:33,005 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.919e+01 2024-08-10 10:11:10,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=501670.0, ans=0.125 2024-08-10 10:11:14,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.53 vs. limit=22.5 2024-08-10 10:11:16,727 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 10:11:16,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=501670.0, ans=0.0 2024-08-10 10:11:21,920 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6700, loss[loss=0.09609, beats_loss=0.01116, ecapa_loss=0.0004273, whisper_loss=0.08066, over 14524.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01212, ecapa_loss=0.0002706, whisper_loss=0.09732, over 3942166.40 frames. ], batch size: 62, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:11:22,281 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 10:11:23,982 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 10:11:27,795 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 10:11:42,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=501870.0, ans=0.125 2024-08-10 10:11:46,376 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 10:11:46,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-10 10:11:59,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=501970.0, ans=0.0 2024-08-10 10:12:00,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 2.966e+01 3.489e+01 3.963e+01 6.232e+01, threshold=6.977e+01, percent-clipped=0.0 2024-08-10 10:12:11,997 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 10:12:15,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=502070.0, ans=0.125 2024-08-10 10:12:45,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6750, loss[loss=0.09634, beats_loss=0.01098, ecapa_loss=0.0002686, whisper_loss=0.08267, over 18351.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01215, ecapa_loss=0.0002695, whisper_loss=0.09683, over 3949535.99 frames. ], batch size: 74, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:12:48,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=502270.0, ans=0.0 2024-08-10 10:13:19,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=502470.0, ans=0.0 2024-08-10 10:13:25,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=502470.0, ans=0.125 2024-08-10 10:13:26,877 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-10 10:13:42,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-08-10 10:13:44,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=502570.0, ans=0.0 2024-08-10 10:13:53,616 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.567e-02 2024-08-10 10:14:11,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6800, loss[loss=0.125, beats_loss=0.01141, ecapa_loss=0.0003364, whisper_loss=0.1102, over 22773.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0121, ecapa_loss=0.0002701, whisper_loss=0.0969, over 3945693.98 frames. ], batch size: 93, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:14:19,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=502770.0, ans=0.125 2024-08-10 10:14:37,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=502870.0, ans=0.0 2024-08-10 10:14:50,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.003e+01 3.545e+01 4.063e+01 8.445e+01, threshold=7.089e+01, percent-clipped=2.0 2024-08-10 10:15:00,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2024-08-10 10:15:07,059 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 10:15:12,474 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 10:15:25,508 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 10:15:27,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503170.0, ans=0.125 2024-08-10 10:15:28,306 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 10:15:28,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503170.0, ans=0.125 2024-08-10 10:15:32,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=503170.0, ans=0.0 2024-08-10 10:15:33,722 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.802e-01 2024-08-10 10:15:35,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6850, loss[loss=0.1075, beats_loss=0.01116, ecapa_loss=0.0003358, whisper_loss=0.09301, over 15389.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01206, ecapa_loss=0.0002704, whisper_loss=0.09711, over 3940001.13 frames. ], batch size: 64, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:15:47,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=22.5 2024-08-10 10:15:58,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=503370.0, ans=0.0 2024-08-10 10:16:14,231 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 10:16:17,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-10 10:16:21,981 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 10:16:23,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=503570.0, ans=0.1 2024-08-10 10:16:25,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=503570.0, ans=0.025 2024-08-10 10:16:26,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=503570.0, ans=0.04949747468305833 2024-08-10 10:16:28,512 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 10:16:38,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=503670.0, ans=0.0 2024-08-10 10:16:54,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6900, loss[loss=0.1443, beats_loss=0.008085, ecapa_loss=0.0002935, whisper_loss=0.1332, over 20538.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01203, ecapa_loss=0.0002724, whisper_loss=0.09742, over 3918137.47 frames. ], batch size: 77, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:17:00,842 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 10:17:30,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.010e+01 3.385e+01 3.920e+01 6.674e+01, threshold=6.771e+01, percent-clipped=0.0 2024-08-10 10:17:33,668 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 10:17:37,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=503970.0, ans=0.09899494936611666 2024-08-10 10:17:49,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=504070.0, ans=0.0 2024-08-10 10:17:59,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=504170.0, ans=0.125 2024-08-10 10:18:07,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=504170.0, ans=0.125 2024-08-10 10:18:14,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 6950, loss[loss=0.1294, beats_loss=0.009989, ecapa_loss=0.0002984, whisper_loss=0.1164, over 20118.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01199, ecapa_loss=0.0002713, whisper_loss=0.09774, over 3900945.05 frames. ], batch size: 84, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:18:15,200 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 10:18:30,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=504370.0, ans=0.015 2024-08-10 10:18:36,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504370.0, ans=0.1 2024-08-10 10:18:48,203 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 10:19:02,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=504470.0, ans=0.125 2024-08-10 10:19:09,971 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 10 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 10:19:17,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=504570.0, ans=0.0 2024-08-10 10:19:19,494 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-10 10:19:24,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=504670.0, ans=0.0 2024-08-10 10:19:30,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=504670.0, ans=0.125 2024-08-10 10:19:35,433 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 10:19:36,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7000, loss[loss=0.1078, beats_loss=0.01224, ecapa_loss=0.0002694, whisper_loss=0.0929, over 14774.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01203, ecapa_loss=0.0002697, whisper_loss=0.09727, over 3889749.21 frames. ], batch size: 59, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:19:55,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-10 10:20:08,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2024-08-10 10:20:09,788 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 10:20:11,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=504970.0, ans=0.0 2024-08-10 10:20:12,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.871e+01 3.202e+01 3.824e+01 7.169e+01, threshold=6.405e+01, percent-clipped=1.0 2024-08-10 10:20:13,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=504970.0, ans=0.0 2024-08-10 10:20:29,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=505070.0, ans=0.125 2024-08-10 10:20:31,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=505070.0, ans=0.125 2024-08-10 10:20:39,013 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 10:20:52,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=505170.0, ans=0.125 2024-08-10 10:20:57,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7050, loss[loss=0.1068, beats_loss=0.01476, ecapa_loss=0.0002703, whisper_loss=0.08938, over 19995.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01207, ecapa_loss=0.0002674, whisper_loss=0.09734, over 3895898.72 frames. ], batch size: 80, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:21:08,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=505270.0, ans=0.125 2024-08-10 10:21:15,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=505370.0, ans=0.125 2024-08-10 10:21:17,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-08-10 10:22:09,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=505670.0, ans=0.125 2024-08-10 10:22:09,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=505670.0, ans=0.0 2024-08-10 10:22:15,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=505770.0, ans=0.125 2024-08-10 10:22:16,467 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7100, loss[loss=0.09564, beats_loss=0.01466, ecapa_loss=0.000183, whisper_loss=0.07915, over 20147.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01211, ecapa_loss=0.0002658, whisper_loss=0.09657, over 3887152.79 frames. ], batch size: 77, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:22:24,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=505770.0, ans=0.09899494936611666 2024-08-10 10:22:28,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=22.5 2024-08-10 10:22:38,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=505870.0, ans=0.125 2024-08-10 10:22:41,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=22.5 2024-08-10 10:22:46,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=505870.0, ans=0.125 2024-08-10 10:22:46,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=505870.0, ans=0.0 2024-08-10 10:22:52,114 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:22:54,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.041e+01 3.472e+01 4.120e+01 8.517e+01, threshold=6.943e+01, percent-clipped=2.0 2024-08-10 10:23:25,687 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 10:23:36,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7150, loss[loss=0.1188, beats_loss=0.01254, ecapa_loss=0.0002176, whisper_loss=0.1041, over 21090.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01205, ecapa_loss=0.0002676, whisper_loss=0.09665, over 3884905.67 frames. ], batch size: 81, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:23:51,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-10 10:24:03,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2024-08-10 10:24:06,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=506370.0, ans=0.125 2024-08-10 10:24:18,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506470.0, ans=0.1 2024-08-10 10:24:24,220 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 10:24:34,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=506570.0, ans=0.125 2024-08-10 10:24:36,596 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 10:24:52,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=506670.0, ans=0.0 2024-08-10 10:24:55,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7200, loss[loss=0.1019, beats_loss=0.00827, ecapa_loss=0.0002619, whisper_loss=0.09104, over 16457.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01202, ecapa_loss=0.0002696, whisper_loss=0.09735, over 3904688.96 frames. ], batch size: 62, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:25:35,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 3.179e+01 3.637e+01 4.087e+01 6.923e+01, threshold=7.273e+01, percent-clipped=0.0 2024-08-10 10:25:52,062 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 10:25:52,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=507070.0, ans=0.125 2024-08-10 10:25:59,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.98 vs. limit=15.0 2024-08-10 10:26:05,005 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 10:26:08,279 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 10:26:13,803 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 10:26:17,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507270.0, ans=0.1 2024-08-10 10:26:18,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7250, loss[loss=0.1399, beats_loss=0.01054, ecapa_loss=0.0002839, whisper_loss=0.1265, over 22562.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.0121, ecapa_loss=0.0002687, whisper_loss=0.09734, over 3941582.02 frames. ], batch size: 90, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:26:23,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=507270.0, ans=10.0 2024-08-10 10:26:45,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=507370.0, ans=0.0 2024-08-10 10:26:46,750 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-10 10:26:51,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507470.0, ans=0.1 2024-08-10 10:26:57,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=507470.0, ans=0.2 2024-08-10 10:26:58,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=507470.0, ans=0.125 2024-08-10 10:27:12,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=507570.0, ans=0.125 2024-08-10 10:27:13,525 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 10:27:37,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7300, loss[loss=0.1012, beats_loss=0.01171, ecapa_loss=0.0002556, whisper_loss=0.08697, over 15095.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.012, ecapa_loss=0.0002702, whisper_loss=0.09743, over 3901513.21 frames. ], batch size: 59, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:27:46,360 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.014e+01 2024-08-10 10:27:58,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=507870.0, ans=0.0 2024-08-10 10:28:04,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=507870.0, ans=0.125 2024-08-10 10:28:16,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.965e+01 3.375e+01 3.820e+01 5.473e+01, threshold=6.750e+01, percent-clipped=0.0 2024-08-10 10:28:17,093 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 10:28:18,744 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 10:28:25,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=507970.0, ans=0.125 2024-08-10 10:28:27,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2024-08-10 10:28:28,055 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 12 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 10:28:35,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.90 vs. limit=10.0 2024-08-10 10:28:43,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=508170.0, ans=0.125 2024-08-10 10:28:48,994 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 10:28:53,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=508170.0, ans=0.5 2024-08-10 10:28:55,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=508170.0, ans=0.0 2024-08-10 10:28:59,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7350, loss[loss=0.1257, beats_loss=0.01066, ecapa_loss=0.0002446, whisper_loss=0.1126, over 23889.00 frames. ], tot_loss[loss=0.111, beats_loss=0.0121, ecapa_loss=0.0002707, whisper_loss=0.09616, over 3879255.59 frames. ], batch size: 90, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:29:23,715 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 10:29:32,696 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 10:29:41,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.44 vs. limit=22.5 2024-08-10 10:29:42,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2024-08-10 10:29:50,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=508570.0, ans=0.5 2024-08-10 10:30:17,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=508670.0, ans=0.125 2024-08-10 10:30:21,289 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 10:30:26,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7400, loss[loss=0.08566, beats_loss=0.008173, ecapa_loss=0.0003989, whisper_loss=0.0735, over 12404.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01204, ecapa_loss=0.0002702, whisper_loss=0.09649, over 3882709.49 frames. ], batch size: 54, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:31:00,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=508970.0, ans=0.125 2024-08-10 10:31:05,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=508970.0, ans=0.0 2024-08-10 10:31:05,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.905e+01 3.226e+01 3.755e+01 5.990e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-10 10:31:10,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=508970.0, ans=0.125 2024-08-10 10:31:14,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=508970.0, ans=0.125 2024-08-10 10:31:23,615 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 10:31:24,958 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 10:31:27,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=509070.0, ans=0.125 2024-08-10 10:31:28,524 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 10:31:41,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509170.0, ans=0.1 2024-08-10 10:31:43,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-08-10 10:31:52,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7450, loss[loss=0.1183, beats_loss=0.01113, ecapa_loss=0.0002463, whisper_loss=0.1047, over 17776.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01211, ecapa_loss=0.0002683, whisper_loss=0.09663, over 3890424.50 frames. ], batch size: 70, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:31:56,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2024-08-10 10:31:57,199 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-10 10:32:12,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=509370.0, ans=0.125 2024-08-10 10:32:25,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509470.0, ans=0.1 2024-08-10 10:32:27,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509470.0, ans=0.1 2024-08-10 10:32:34,380 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 10:32:48,142 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 10:32:50,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=509570.0, ans=0.125 2024-08-10 10:32:55,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=509570.0, ans=0.0 2024-08-10 10:33:04,107 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 10:33:09,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=509670.0, ans=0.125 2024-08-10 10:33:14,449 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 10:33:17,695 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 10:33:18,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7500, loss[loss=0.1041, beats_loss=0.01037, ecapa_loss=0.0002566, whisper_loss=0.09116, over 17863.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01212, ecapa_loss=0.0002676, whisper_loss=0.09604, over 3893795.59 frames. ], batch size: 69, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:33:21,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=509770.0, ans=0.5 2024-08-10 10:33:58,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.154e+01 3.513e+01 4.160e+01 5.952e+01, threshold=7.025e+01, percent-clipped=0.0 2024-08-10 10:34:03,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=509970.0, ans=0.125 2024-08-10 10:34:22,339 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 10:34:22,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=510070.0, ans=0.125 2024-08-10 10:34:35,992 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 10:34:43,763 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7550, loss[loss=0.1316, beats_loss=0.01174, ecapa_loss=0.0002818, whisper_loss=0.117, over 22430.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01203, ecapa_loss=0.0002693, whisper_loss=0.09667, over 3879175.60 frames. ], batch size: 91, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:34:45,185 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 10:34:47,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-10 10:35:00,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=510370.0, ans=0.1 2024-08-10 10:35:12,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=510370.0, ans=0.0 2024-08-10 10:35:22,434 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 24 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-10 10:35:22,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=510470.0, ans=0.2 2024-08-10 10:35:27,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2024-08-10 10:35:28,796 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 10:35:53,810 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 10:36:07,450 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7600, loss[loss=0.1302, beats_loss=0.009623, ecapa_loss=0.0003192, whisper_loss=0.1174, over 15288.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01212, ecapa_loss=0.0002687, whisper_loss=0.09541, over 3857383.00 frames. ], batch size: 59, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:36:20,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=510770.0, ans=0.125 2024-08-10 10:36:26,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=510870.0, ans=0.125 2024-08-10 10:36:29,553 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 10:36:32,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=510870.0, ans=0.125 2024-08-10 10:36:39,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=510870.0, ans=0.0 2024-08-10 10:36:45,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.819e+01 3.165e+01 3.521e+01 5.971e+01, threshold=6.331e+01, percent-clipped=0.0 2024-08-10 10:36:47,137 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 10:36:55,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510970.0, ans=0.1 2024-08-10 10:37:06,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=511070.0, ans=0.0 2024-08-10 10:37:11,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=511070.0, ans=0.125 2024-08-10 10:37:13,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=511070.0, ans=0.125 2024-08-10 10:37:23,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.20 vs. limit=15.0 2024-08-10 10:37:31,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=511170.0, ans=0.2 2024-08-10 10:37:32,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-10 10:37:34,286 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7650, loss[loss=0.1061, beats_loss=0.01063, ecapa_loss=0.0002912, whisper_loss=0.09257, over 15338.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01219, ecapa_loss=0.0002672, whisper_loss=0.09489, over 3862872.30 frames. ], batch size: 61, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:37:36,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=511270.0, ans=0.125 2024-08-10 10:37:47,561 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 10:37:47,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=511270.0, ans=0.2 2024-08-10 10:37:52,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=12.0 2024-08-10 10:37:54,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=511370.0, ans=0.125 2024-08-10 10:37:55,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=511370.0, ans=0.0 2024-08-10 10:38:00,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=511370.0, ans=0.125 2024-08-10 10:38:11,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2024-08-10 10:38:17,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=511470.0, ans=0.0 2024-08-10 10:38:28,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=511570.0, ans=0.09899494936611666 2024-08-10 10:38:40,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=511670.0, ans=0.125 2024-08-10 10:38:42,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-10 10:38:47,665 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 10:38:54,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.88 vs. limit=10.0 2024-08-10 10:38:58,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7700, loss[loss=0.1278, beats_loss=0.01362, ecapa_loss=0.0002478, whisper_loss=0.1117, over 19559.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01219, ecapa_loss=0.0002684, whisper_loss=0.09565, over 3881422.23 frames. ], batch size: 75, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:39:02,884 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 10:39:05,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-10 10:39:18,895 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 10:39:20,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511870.0, ans=0.1 2024-08-10 10:39:23,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=511870.0, ans=0.125 2024-08-10 10:39:39,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 3.237e+01 3.581e+01 4.281e+01 8.585e+01, threshold=7.162e+01, percent-clipped=2.0 2024-08-10 10:39:46,023 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 10:39:55,950 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-10 10:40:01,313 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 10:40:09,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512170.0, ans=0.1 2024-08-10 10:40:22,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7750, loss[loss=0.1229, beats_loss=0.01071, ecapa_loss=0.0002902, whisper_loss=0.1093, over 16600.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01219, ecapa_loss=0.0002681, whisper_loss=0.09503, over 3879400.47 frames. ], batch size: 66, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:40:25,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=512270.0, ans=0.125 2024-08-10 10:40:40,421 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 10:40:43,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=512370.0, ans=0.09899494936611666 2024-08-10 10:40:51,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=512370.0, ans=0.125 2024-08-10 10:40:54,442 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:41:00,875 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 10:41:10,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=512470.0, ans=0.04949747468305833 2024-08-10 10:41:10,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=512470.0, ans=15.0 2024-08-10 10:41:44,775 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 10:41:45,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7800, loss[loss=0.1214, beats_loss=0.01112, ecapa_loss=0.0002536, whisper_loss=0.1078, over 23325.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.0121, ecapa_loss=0.0002659, whisper_loss=0.09596, over 3900444.60 frames. ], batch size: 92, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:41:48,131 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 10:41:53,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=512770.0, ans=0.125 2024-08-10 10:41:58,554 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.371e+01 2024-08-10 10:42:01,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=512870.0, ans=0.125 2024-08-10 10:42:12,822 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 10:42:18,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=512970.0, ans=0.125 2024-08-10 10:42:20,258 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 10:42:23,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.017e+01 3.377e+01 3.890e+01 5.572e+01, threshold=6.753e+01, percent-clipped=0.0 2024-08-10 10:42:36,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=513070.0, ans=0.07 2024-08-10 10:42:38,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=513070.0, ans=0.2 2024-08-10 10:43:03,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7850, loss[loss=0.08548, beats_loss=0.01155, ecapa_loss=0.0002095, whisper_loss=0.07184, over 14418.00 frames. ], tot_loss[loss=0.11, beats_loss=0.0122, ecapa_loss=0.000266, whisper_loss=0.09516, over 3848904.12 frames. ], batch size: 54, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:43:18,826 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 36 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-10 10:43:20,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=513370.0, ans=0.09899494936611666 2024-08-10 10:43:56,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=513570.0, ans=0.5 2024-08-10 10:44:27,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7900, loss[loss=0.1032, beats_loss=0.01159, ecapa_loss=0.0003527, whisper_loss=0.08809, over 20833.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01214, ecapa_loss=0.0002649, whisper_loss=0.0957, over 3864137.35 frames. ], batch size: 89, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:44:31,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=513770.0, ans=0.2 2024-08-10 10:44:42,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=513770.0, ans=0.035 2024-08-10 10:44:44,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-10 10:45:00,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=513970.0, ans=0.1 2024-08-10 10:45:03,085 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 10:45:05,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.988e+01 3.259e+01 3.767e+01 5.929e+01, threshold=6.519e+01, percent-clipped=0.0 2024-08-10 10:45:13,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=513970.0, ans=0.2 2024-08-10 10:45:26,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=514070.0, ans=0.05 2024-08-10 10:45:38,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=514170.0, ans=0.0 2024-08-10 10:45:40,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=514170.0, ans=0.125 2024-08-10 10:45:46,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=514170.0, ans=0.125 2024-08-10 10:45:50,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 7950, loss[loss=0.1047, beats_loss=0.01342, ecapa_loss=0.0002539, whisper_loss=0.08878, over 21980.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01211, ecapa_loss=0.0002633, whisper_loss=0.09588, over 3854395.37 frames. ], batch size: 91, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:45:52,339 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 10:46:07,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=514370.0, ans=0.125 2024-08-10 10:46:08,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=514370.0, ans=0.125 2024-08-10 10:46:38,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=514470.0, ans=0.0 2024-08-10 10:46:43,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=514570.0, ans=0.125 2024-08-10 10:46:43,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514570.0, ans=0.1 2024-08-10 10:46:49,723 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-10 10:46:53,096 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 27 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-10 10:47:11,094 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 10:47:12,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8000, loss[loss=0.1269, beats_loss=0.0101, ecapa_loss=0.0002965, whisper_loss=0.1138, over 19793.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01222, ecapa_loss=0.0002592, whisper_loss=0.09574, over 3851731.06 frames. ], batch size: 78, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:47:16,117 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 32 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 10:47:17,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=514770.0, ans=0.0 2024-08-10 10:47:22,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=514770.0, ans=0.0 2024-08-10 10:47:37,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=514870.0, ans=0.0 2024-08-10 10:47:40,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514870.0, ans=0.1 2024-08-10 10:47:52,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.849e+01 3.134e+01 3.665e+01 7.663e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 10:47:55,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=514970.0, ans=0.2 2024-08-10 10:48:40,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8050, loss[loss=0.09984, beats_loss=0.01041, ecapa_loss=0.0002838, whisper_loss=0.08658, over 22302.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01217, ecapa_loss=0.0002584, whisper_loss=0.09571, over 3845542.71 frames. ], batch size: 89, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:49:15,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=515270.0, ans=0.125 2024-08-10 10:49:24,914 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 10:49:25,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515270.0, ans=0.1 2024-08-10 10:49:38,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=515370.0, ans=0.0 2024-08-10 10:49:39,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=515370.0, ans=0.125 2024-08-10 10:49:46,459 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 10:49:47,990 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 10:49:58,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=515470.0, ans=0.2 2024-08-10 10:50:09,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=515570.0, ans=0.125 2024-08-10 10:50:30,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=515670.0, ans=0.0 2024-08-10 10:50:41,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8100, loss[loss=0.1167, beats_loss=0.009787, ecapa_loss=0.0003088, whisper_loss=0.1038, over 17628.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01217, ecapa_loss=0.0002592, whisper_loss=0.09544, over 3858911.37 frames. ], batch size: 71, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:50:46,961 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 10:51:20,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.111e+01 3.674e+01 4.170e+01 5.858e+01, threshold=7.349e+01, percent-clipped=0.0 2024-08-10 10:51:25,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=515970.0, ans=0.0 2024-08-10 10:51:50,802 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 10:51:54,464 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-10 10:51:54,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=516170.0, ans=0.125 2024-08-10 10:52:03,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8150, loss[loss=0.104, beats_loss=0.01146, ecapa_loss=0.0002859, whisper_loss=0.08972, over 16886.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01218, ecapa_loss=0.0002609, whisper_loss=0.09511, over 3873140.05 frames. ], batch size: 67, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:52:10,630 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 10:52:27,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.89 vs. limit=10.0 2024-08-10 10:52:34,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516470.0, ans=0.1 2024-08-10 10:52:38,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=516470.0, ans=0.125 2024-08-10 10:52:41,429 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 10:52:41,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=516470.0, ans=0.125 2024-08-10 10:52:43,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=516470.0, ans=0.2 2024-08-10 10:53:03,495 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 10:53:13,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=516670.0, ans=0.125 2024-08-10 10:53:23,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8200, loss[loss=0.1245, beats_loss=0.009037, ecapa_loss=0.0002438, whisper_loss=0.113, over 17793.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0121, ecapa_loss=0.0002617, whisper_loss=0.0958, over 3863168.53 frames. ], batch size: 68, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:53:25,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=516770.0, ans=0.0 2024-08-10 10:53:46,427 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 10:53:46,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=516870.0, ans=0.125 2024-08-10 10:53:59,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516970.0, ans=0.1 2024-08-10 10:54:00,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.913e+01 3.375e+01 3.842e+01 5.271e+01, threshold=6.749e+01, percent-clipped=0.0 2024-08-10 10:54:05,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=516970.0, ans=10.0 2024-08-10 10:54:09,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=516970.0, ans=0.0 2024-08-10 10:54:19,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-10 10:54:21,364 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 39 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 10:54:23,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=517070.0, ans=0.125 2024-08-10 10:54:23,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-10 10:54:26,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517170.0, ans=0.1 2024-08-10 10:54:28,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2024-08-10 10:54:42,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8250, loss[loss=0.1218, beats_loss=0.01082, ecapa_loss=0.0002653, whisper_loss=0.1084, over 19665.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01201, ecapa_loss=0.0002621, whisper_loss=0.09634, over 3838439.90 frames. ], batch size: 77, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:54:52,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=517270.0, ans=0.09899494936611666 2024-08-10 10:54:53,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=22.5 2024-08-10 10:55:23,499 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 10:55:28,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=517570.0, ans=0.125 2024-08-10 10:55:43,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=517570.0, ans=0.0 2024-08-10 10:55:44,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=517670.0, ans=0.2 2024-08-10 10:55:56,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-08-10 10:55:56,840 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 10:56:00,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8300, loss[loss=0.106, beats_loss=0.008962, ecapa_loss=0.0002599, whisper_loss=0.09443, over 17691.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01202, ecapa_loss=0.0002603, whisper_loss=0.09649, over 3867546.39 frames. ], batch size: 68, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:56:05,760 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 10:56:26,086 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 10:56:36,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.966e+01 3.242e+01 3.921e+01 6.642e+01, threshold=6.483e+01, percent-clipped=0.0 2024-08-10 10:56:38,045 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 10:56:42,271 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.722e+01 2024-08-10 10:57:11,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=518170.0, ans=0.125 2024-08-10 10:57:13,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=518170.0, ans=0.125 2024-08-10 10:57:19,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518170.0, ans=0.1 2024-08-10 10:57:24,695 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8350, loss[loss=0.1035, beats_loss=0.01453, ecapa_loss=0.0002274, whisper_loss=0.08674, over 18958.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01198, ecapa_loss=0.0002614, whisper_loss=0.09712, over 3876597.36 frames. ], batch size: 76, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:58:05,209 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 10:58:14,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=518470.0, ans=0.0 2024-08-10 10:58:24,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-10 10:58:27,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=518570.0, ans=0.125 2024-08-10 10:58:51,873 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 10:58:59,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8400, loss[loss=0.1053, beats_loss=0.01127, ecapa_loss=0.0002408, whisper_loss=0.09162, over 21740.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0119, ecapa_loss=0.0002626, whisper_loss=0.09787, over 3888614.75 frames. ], batch size: 83, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:59:07,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=518770.0, ans=0.2 2024-08-10 10:59:25,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=518870.0, ans=0.125 2024-08-10 10:59:29,333 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 10:59:37,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=518970.0, ans=0.125 2024-08-10 10:59:40,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518970.0, ans=0.1 2024-08-10 10:59:40,892 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 3.091e+01 3.394e+01 4.166e+01 8.578e+01, threshold=6.788e+01, percent-clipped=4.0 2024-08-10 11:00:12,319 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:00:24,312 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 11:00:27,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8450, loss[loss=0.1064, beats_loss=0.01156, ecapa_loss=0.0002892, whisper_loss=0.09195, over 22223.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01192, ecapa_loss=0.0002631, whisper_loss=0.09782, over 3895043.02 frames. ], batch size: 91, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:00:30,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=519270.0, ans=0.0 2024-08-10 11:00:53,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=519370.0, ans=0.125 2024-08-10 11:00:59,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2024-08-10 11:01:04,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=519470.0, ans=0.2 2024-08-10 11:01:10,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-10 11:01:17,421 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 11:01:26,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=519570.0, ans=0.125 2024-08-10 11:01:54,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8500, loss[loss=0.1031, beats_loss=0.01351, ecapa_loss=0.0002547, whisper_loss=0.08701, over 22594.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01199, ecapa_loss=0.0002648, whisper_loss=0.09746, over 3924490.63 frames. ], batch size: 93, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:01:56,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=519770.0, ans=0.125 2024-08-10 11:01:58,098 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 11:02:04,380 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 11:02:35,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=519970.0, ans=0.0 2024-08-10 11:02:40,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 3.171e+01 3.733e+01 4.165e+01 6.058e+01, threshold=7.466e+01, percent-clipped=0.0 2024-08-10 11:02:40,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=519970.0, ans=0.125 2024-08-10 11:02:45,307 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 11:03:06,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=520070.0, ans=0.2 2024-08-10 11:03:18,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=520170.0, ans=0.1 2024-08-10 11:03:24,401 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 11:03:25,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=520270.0, ans=0.125 2024-08-10 11:03:26,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8550, loss[loss=0.0955, beats_loss=0.01301, ecapa_loss=0.0002617, whisper_loss=0.07987, over 18223.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01194, ecapa_loss=0.0002661, whisper_loss=0.09764, over 3923647.02 frames. ], batch size: 76, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:03:53,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=520370.0, ans=0.125 2024-08-10 11:04:16,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=520470.0, ans=0.125 2024-08-10 11:04:35,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=520570.0, ans=0.125 2024-08-10 11:04:45,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=520670.0, ans=0.125 2024-08-10 11:04:46,166 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-10 11:04:51,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=520670.0, ans=12.0 2024-08-10 11:04:56,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520770.0, ans=0.1 2024-08-10 11:04:57,120 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8600, loss[loss=0.12, beats_loss=0.008386, ecapa_loss=0.0002761, whisper_loss=0.1088, over 15280.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.0119, ecapa_loss=0.0002671, whisper_loss=0.09833, over 3917696.24 frames. ], batch size: 57, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:05:07,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=520770.0, ans=0.025 2024-08-10 11:05:09,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520770.0, ans=0.1 2024-08-10 11:05:21,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=520870.0, ans=0.1 2024-08-10 11:05:36,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+01 3.011e+01 3.429e+01 3.879e+01 6.555e+01, threshold=6.857e+01, percent-clipped=0.0 2024-08-10 11:05:37,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=520970.0, ans=0.0 2024-08-10 11:05:52,879 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 11:06:19,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=521170.0, ans=0.05 2024-08-10 11:06:27,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8650, loss[loss=0.1051, beats_loss=0.01264, ecapa_loss=0.0002171, whisper_loss=0.09025, over 21080.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01204, ecapa_loss=0.000267, whisper_loss=0.09728, over 3887273.45 frames. ], batch size: 81, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:06:42,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=521270.0, ans=0.0 2024-08-10 11:06:42,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-08-10 11:07:07,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=521470.0, ans=0.125 2024-08-10 11:07:30,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=521570.0, ans=0.0 2024-08-10 11:07:37,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=521570.0, ans=0.125 2024-08-10 11:07:56,445 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 11:07:56,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=521770.0, ans=0.125 2024-08-10 11:07:57,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8700, loss[loss=0.1019, beats_loss=0.01319, ecapa_loss=0.000233, whisper_loss=0.08635, over 17393.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01212, ecapa_loss=0.0002648, whisper_loss=0.09691, over 3893135.42 frames. ], batch size: 68, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:08:01,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2024-08-10 11:08:21,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521870.0, ans=0.1 2024-08-10 11:08:25,487 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 11:08:36,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=521970.0, ans=0.0 2024-08-10 11:08:37,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.909e+01 3.289e+01 3.792e+01 9.063e+01, threshold=6.579e+01, percent-clipped=1.0 2024-08-10 11:08:44,033 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 11:08:51,428 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.450e+01 2024-08-10 11:08:54,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=522070.0, ans=0.1 2024-08-10 11:09:00,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.97 vs. limit=15.0 2024-08-10 11:09:08,598 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-10 11:09:21,378 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-10 11:09:25,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8750, loss[loss=0.08466, beats_loss=0.01355, ecapa_loss=0.0002471, whisper_loss=0.06864, over 18933.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01202, ecapa_loss=0.000265, whisper_loss=0.09702, over 3862843.88 frames. ], batch size: 77, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:09:30,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=522270.0, ans=0.125 2024-08-10 11:09:31,910 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 11:09:34,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-10 11:09:40,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=522270.0, ans=0.07 2024-08-10 11:10:27,950 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 11:10:31,951 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 11:10:52,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8800, loss[loss=0.1094, beats_loss=0.009553, ecapa_loss=0.0003431, whisper_loss=0.09642, over 18834.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01205, ecapa_loss=0.0002651, whisper_loss=0.09656, over 3847001.70 frames. ], batch size: 81, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:11:01,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=522770.0, ans=0.0 2024-08-10 11:11:16,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.69 vs. limit=10.0 2024-08-10 11:11:21,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=522870.0, ans=0.0 2024-08-10 11:11:23,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=522870.0, ans=0.0 2024-08-10 11:11:32,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.151e+01 3.444e+01 3.946e+01 7.427e+01, threshold=6.887e+01, percent-clipped=2.0 2024-08-10 11:11:45,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=523070.0, ans=0.0 2024-08-10 11:11:56,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523070.0, ans=0.125 2024-08-10 11:12:21,322 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8850, loss[loss=0.1179, beats_loss=0.01074, ecapa_loss=0.0002068, whisper_loss=0.1051, over 15728.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01207, ecapa_loss=0.0002636, whisper_loss=0.0964, over 3853261.55 frames. ], batch size: 57, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:12:54,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-10 11:13:04,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=523470.0, ans=0.0 2024-08-10 11:13:08,036 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 11:13:10,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-10 11:13:11,483 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 11:13:23,770 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 11:13:50,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=523770.0, ans=0.0 2024-08-10 11:13:51,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8900, loss[loss=0.1352, beats_loss=0.00829, ecapa_loss=0.0002983, whisper_loss=0.124, over 22205.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01204, ecapa_loss=0.0002634, whisper_loss=0.09653, over 3852738.35 frames. ], batch size: 82, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:14:10,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=523870.0, ans=0.125 2024-08-10 11:14:13,131 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 11:14:35,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.997e+01 3.258e+01 3.778e+01 5.539e+01, threshold=6.517e+01, percent-clipped=0.0 2024-08-10 11:14:52,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=524070.0, ans=0.125 2024-08-10 11:14:52,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=524070.0, ans=0.0 2024-08-10 11:15:03,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=524070.0, ans=0.0 2024-08-10 11:15:14,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524170.0, ans=0.1 2024-08-10 11:15:22,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 8950, loss[loss=0.1154, beats_loss=0.01218, ecapa_loss=0.0002907, whisper_loss=0.1003, over 21394.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01199, ecapa_loss=0.0002652, whisper_loss=0.09616, over 3870430.92 frames. ], batch size: 88, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:15:23,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=524270.0, ans=0.09899494936611666 2024-08-10 11:15:42,767 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:15:48,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=524370.0, ans=0.2 2024-08-10 11:15:51,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=524370.0, ans=0.125 2024-08-10 11:15:51,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=524370.0, ans=0.125 2024-08-10 11:16:06,866 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 11:16:16,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=524570.0, ans=0.125 2024-08-10 11:16:33,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2024-08-10 11:16:40,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524670.0, ans=0.1 2024-08-10 11:16:42,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.42 vs. limit=22.5 2024-08-10 11:16:49,798 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9000, loss[loss=0.1035, beats_loss=0.01194, ecapa_loss=0.0002545, whisper_loss=0.08901, over 19081.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.012, ecapa_loss=0.0002666, whisper_loss=0.09597, over 3841375.16 frames. ], batch size: 75, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:16:49,798 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 11:17:36,039 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on ASR_libri: loss=0.2658, beats_loss=0, ecapa_loss=0.000793, whisper_loss=0.2579, over 922467.00 frames. 2024-08-10 11:17:54,624 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on SV_voxceleb1: loss=0.007025, beats_loss=0, ecapa_loss=0.0007025, whisper_loss=0, over 939242.00 frames. 2024-08-10 11:19:54,479 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on AT_audioset: loss=0.02753, beats_loss=0.02753, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 11:19:54,483 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 11:20:06,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-08-10 11:20:33,295 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 3.014e+01 3.320e+01 3.675e+01 5.799e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:20:42,521 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 11:20:44,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=525070.0, ans=0.125 2024-08-10 11:20:50,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=525070.0, ans=0.025 2024-08-10 11:21:19,438 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9050, loss[loss=0.1019, beats_loss=0.01398, ecapa_loss=0.0002611, whisper_loss=0.08527, over 20935.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01204, ecapa_loss=0.0002652, whisper_loss=0.09591, over 3872440.91 frames. ], batch size: 89, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:21:29,565 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 11:21:43,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-08-10 11:21:56,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.55 vs. limit=10.0 2024-08-10 11:22:09,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2024-08-10 11:22:21,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=525570.0, ans=0.2 2024-08-10 11:22:28,485 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 11:22:30,004 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 11:22:30,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=525670.0, ans=0.0 2024-08-10 11:22:43,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9100, loss[loss=0.1189, beats_loss=0.01174, ecapa_loss=0.0002466, whisper_loss=0.1047, over 22423.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01197, ecapa_loss=0.0002669, whisper_loss=0.096, over 3860121.30 frames. ], batch size: 90, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:22:44,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=525770.0, ans=0.125 2024-08-10 11:22:54,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525770.0, ans=0.1 2024-08-10 11:23:16,140 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 9 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 11:23:20,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.970e+01 3.325e+01 3.905e+01 6.354e+01, threshold=6.649e+01, percent-clipped=0.0 2024-08-10 11:23:26,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=525970.0, ans=0.2 2024-08-10 11:23:36,274 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 11:23:37,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=526070.0, ans=0.2 2024-08-10 11:23:38,981 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 11:23:42,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=526070.0, ans=0.0 2024-08-10 11:23:45,056 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 11:23:55,610 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-10 11:24:03,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9150, loss[loss=0.1291, beats_loss=0.01244, ecapa_loss=0.000232, whisper_loss=0.1144, over 22588.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01208, ecapa_loss=0.0002668, whisper_loss=0.09498, over 3867834.03 frames. ], batch size: 89, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:24:09,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=526270.0, ans=0.0 2024-08-10 11:24:28,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=526370.0, ans=0.125 2024-08-10 11:24:44,805 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 11:24:59,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=526570.0, ans=0.125 2024-08-10 11:24:59,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=526570.0, ans=0.125 2024-08-10 11:25:01,892 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 11:25:18,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9200, loss[loss=0.1133, beats_loss=0.01085, ecapa_loss=0.0002293, whisper_loss=0.1001, over 14216.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01198, ecapa_loss=0.0002671, whisper_loss=0.09597, over 3876656.30 frames. ], batch size: 54, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:25:40,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=526870.0, ans=10.0 2024-08-10 11:25:43,117 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 25 from LS+wenet, 17 from Vox, 13 fro AS 2024-08-10 11:25:49,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.987e+01 3.332e+01 3.744e+01 5.839e+01, threshold=6.663e+01, percent-clipped=0.0 2024-08-10 11:25:53,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=526970.0, ans=0.125 2024-08-10 11:26:04,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=527070.0, ans=0.125 2024-08-10 11:26:17,479 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 21 from LS+wenet, 24 from Vox, 50 fro AS 2024-08-10 11:26:22,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=527170.0, ans=0.125 2024-08-10 11:26:23,801 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 11:26:24,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9250, loss[loss=0.1034, beats_loss=0.01129, ecapa_loss=0.0003057, whisper_loss=0.08908, over 13877.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01204, ecapa_loss=0.0002652, whisper_loss=0.09585, over 3905496.55 frames. ], batch size: 60, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:26:25,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=527270.0, ans=0.125 2024-08-10 11:26:26,807 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:26:28,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=527270.0, ans=0.125 2024-08-10 11:26:41,611 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 11:26:45,918 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 11:26:48,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=527370.0, ans=0.2 2024-08-10 11:26:57,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527470.0, ans=0.1 2024-08-10 11:26:59,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=527470.0, ans=15.0 2024-08-10 11:27:06,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527570.0, ans=0.1 2024-08-10 11:27:08,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-08-10 11:27:13,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=527570.0, ans=0.0 2024-08-10 11:27:19,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=527670.0, ans=0.0 2024-08-10 11:27:30,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9300, loss[loss=0.1066, beats_loss=0.01449, ecapa_loss=0.0002646, whisper_loss=0.08942, over 18507.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01209, ecapa_loss=0.0002633, whisper_loss=0.09519, over 3893926.45 frames. ], batch size: 78, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:27:41,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527770.0, ans=0.1 2024-08-10 11:27:47,467 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 11:27:54,434 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 11:27:57,842 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 11:27:58,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=527970.0, ans=0.125 2024-08-10 11:28:02,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.982e+01 3.468e+01 4.140e+01 6.249e+01, threshold=6.936e+01, percent-clipped=0.0 2024-08-10 11:28:34,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=528170.0, ans=0.125 2024-08-10 11:28:41,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9350, loss[loss=0.1121, beats_loss=0.01102, ecapa_loss=0.0002446, whisper_loss=0.09865, over 18945.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01208, ecapa_loss=0.0002611, whisper_loss=0.09558, over 3886686.15 frames. ], batch size: 73, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:28:53,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2024-08-10 11:29:03,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2024-08-10 11:29:05,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=528370.0, ans=0.125 2024-08-10 11:29:19,759 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 11:29:22,782 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 11:29:27,093 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 11:29:33,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-10 11:29:41,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=528670.0, ans=0.125 2024-08-10 11:29:44,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=528670.0, ans=0.1 2024-08-10 11:29:53,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9400, loss[loss=0.09653, beats_loss=0.0139, ecapa_loss=0.0002979, whisper_loss=0.07966, over 20359.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.0121, ecapa_loss=0.0002621, whisper_loss=0.09555, over 3893878.00 frames. ], batch size: 87, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:30:01,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=528770.0, ans=0.125 2024-08-10 11:30:03,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-10 11:30:18,785 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 11:30:27,201 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 11:30:30,830 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 11:30:32,155 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 3.113e+01 3.432e+01 4.042e+01 8.997e+01, threshold=6.863e+01, percent-clipped=2.0 2024-08-10 11:30:47,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=529070.0, ans=0.0 2024-08-10 11:30:58,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529170.0, ans=0.125 2024-08-10 11:31:05,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=529170.0, ans=0.125 2024-08-10 11:31:05,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=529170.0, ans=0.125 2024-08-10 11:31:07,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=529170.0, ans=0.0 2024-08-10 11:31:10,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9450, loss[loss=0.1045, beats_loss=0.01263, ecapa_loss=0.0002736, whisper_loss=0.08918, over 22892.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01215, ecapa_loss=0.0002629, whisper_loss=0.09549, over 3890708.92 frames. ], batch size: 91, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:31:10,580 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 11:31:13,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529270.0, ans=0.1 2024-08-10 11:31:26,407 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 11:31:38,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=529470.0, ans=0.125 2024-08-10 11:31:40,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=529470.0, ans=0.125 2024-08-10 11:32:14,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=529670.0, ans=0.1 2024-08-10 11:32:18,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529670.0, ans=0.1 2024-08-10 11:32:27,197 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9500, loss[loss=0.1149, beats_loss=0.01218, ecapa_loss=0.0002276, whisper_loss=0.1004, over 19412.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01211, ecapa_loss=0.0002632, whisper_loss=0.09564, over 3900594.47 frames. ], batch size: 75, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:32:56,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=529970.0, ans=0.5 2024-08-10 11:33:00,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.898e+01 3.217e+01 3.700e+01 5.976e+01, threshold=6.434e+01, percent-clipped=0.0 2024-08-10 11:33:01,142 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 11:33:02,414 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 11:33:10,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=530070.0, ans=0.125 2024-08-10 11:33:10,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2024-08-10 11:33:13,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2024-08-10 11:33:14,022 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 11:33:14,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2024-08-10 11:33:20,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=530070.0, ans=0.125 2024-08-10 11:33:26,470 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-10 11:33:29,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=530170.0, ans=0.2 2024-08-10 11:33:38,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9550, loss[loss=0.1031, beats_loss=0.01021, ecapa_loss=0.0002382, whisper_loss=0.09056, over 17096.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.0121, ecapa_loss=0.0002637, whisper_loss=0.09543, over 3862456.34 frames. ], batch size: 65, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:33:47,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=530270.0, ans=0.0 2024-08-10 11:34:12,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=530470.0, ans=0.125 2024-08-10 11:34:20,314 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 11:34:20,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=530570.0, ans=0.95 2024-08-10 11:34:23,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=530570.0, ans=0.1 2024-08-10 11:34:26,985 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 11:34:38,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-08-10 11:34:45,671 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9600, loss[loss=0.1098, beats_loss=0.0127, ecapa_loss=0.0002372, whisper_loss=0.0947, over 20220.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01208, ecapa_loss=0.0002625, whisper_loss=0.09517, over 3836630.75 frames. ], batch size: 79, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:34:52,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=530770.0, ans=0.125 2024-08-10 11:34:59,252 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 11:35:03,409 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 11:35:16,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.859e+01 3.374e+01 4.050e+01 6.854e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 11:35:26,853 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 11:35:27,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=531070.0, ans=0.125 2024-08-10 11:35:28,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=531070.0, ans=0.125 2024-08-10 11:35:41,551 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-10 11:35:41,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=531170.0, ans=0.125 2024-08-10 11:35:49,061 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-10 11:35:50,316 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 11:35:51,452 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9650, loss[loss=0.1228, beats_loss=0.01084, ecapa_loss=0.0002736, whisper_loss=0.1092, over 16411.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01209, ecapa_loss=0.0002613, whisper_loss=0.09585, over 3822111.86 frames. ], batch size: 64, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:36:12,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=531370.0, ans=0.2 2024-08-10 11:36:20,820 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-10 11:36:22,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=22.5 2024-08-10 11:36:23,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=531470.0, ans=0.2 2024-08-10 11:36:29,329 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 11:36:34,407 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 11:36:38,663 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 11:36:48,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=531670.0, ans=0.125 2024-08-10 11:36:55,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9700, loss[loss=0.1285, beats_loss=0.01145, ecapa_loss=0.0003155, whisper_loss=0.1139, over 20888.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01198, ecapa_loss=0.0002629, whisper_loss=0.09603, over 3848173.98 frames. ], batch size: 86, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:36:59,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531770.0, ans=0.1 2024-08-10 11:37:07,969 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 11:37:13,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531870.0, ans=0.1 2024-08-10 11:37:13,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=531870.0, ans=0.04949747468305833 2024-08-10 11:37:14,067 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 11:37:20,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=531970.0, ans=0.025 2024-08-10 11:37:24,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.979e+01 3.402e+01 3.794e+01 6.549e+01, threshold=6.804e+01, percent-clipped=0.0 2024-08-10 11:37:26,039 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 11:37:31,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-08-10 11:37:42,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-10 11:37:48,445 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 11:37:52,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=532170.0, ans=0.0 2024-08-10 11:37:59,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=532270.0, ans=0.07 2024-08-10 11:38:00,168 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9750, loss[loss=0.14, beats_loss=0.01133, ecapa_loss=0.000282, whisper_loss=0.1259, over 22266.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01195, ecapa_loss=0.0002636, whisper_loss=0.09657, over 3846776.64 frames. ], batch size: 88, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:38:04,563 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:38:05,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=532270.0, ans=0.0 2024-08-10 11:38:07,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532270.0, ans=0.1 2024-08-10 11:38:10,652 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 16 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 11:38:12,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2024-08-10 11:38:34,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-08-10 11:38:46,832 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 11:39:05,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=532770.0, ans=0.125 2024-08-10 11:39:06,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9800, loss[loss=0.1067, beats_loss=0.01554, ecapa_loss=0.0001947, whisper_loss=0.08925, over 22987.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01204, ecapa_loss=0.0002624, whisper_loss=0.09619, over 3841159.39 frames. ], batch size: 89, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:39:09,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-10 11:39:13,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=532770.0, ans=0.0 2024-08-10 11:39:35,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=532970.0, ans=0.0 2024-08-10 11:39:36,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.053e+01 3.396e+01 3.815e+01 6.772e+01, threshold=6.792e+01, percent-clipped=0.0 2024-08-10 11:39:40,149 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 11:39:46,850 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 11:39:56,277 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.078e+00 2024-08-10 11:39:56,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=533070.0, ans=0.125 2024-08-10 11:40:02,261 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 11:40:11,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9850, loss[loss=0.1048, beats_loss=0.01344, ecapa_loss=0.0002402, whisper_loss=0.08893, over 20344.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01209, ecapa_loss=0.0002616, whisper_loss=0.09667, over 3881947.78 frames. ], batch size: 86, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:40:11,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533270.0, ans=0.1 2024-08-10 11:40:16,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=533270.0, ans=0.2 2024-08-10 11:40:20,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=533270.0, ans=0.015 2024-08-10 11:40:20,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=533270.0, ans=0.125 2024-08-10 11:40:21,877 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 11:40:33,352 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 11:40:44,703 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 11:40:45,910 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 11:40:49,712 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 11:40:51,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=533570.0, ans=0.2 2024-08-10 11:40:52,277 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 11:40:55,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=533570.0, ans=0.0 2024-08-10 11:40:55,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=533570.0, ans=0.07 2024-08-10 11:40:57,391 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 11:41:01,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=533670.0, ans=0.0 2024-08-10 11:41:15,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9900, loss[loss=0.1101, beats_loss=0.01184, ecapa_loss=0.0002244, whisper_loss=0.09596, over 15805.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01214, ecapa_loss=0.0002622, whisper_loss=0.09607, over 3910617.46 frames. ], batch size: 61, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:41:23,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533770.0, ans=0.1 2024-08-10 11:41:29,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=533870.0, ans=0.125 2024-08-10 11:41:37,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-10 11:41:38,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533870.0, ans=0.1 2024-08-10 11:41:39,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=533870.0, ans=0.125 2024-08-10 11:41:45,713 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.867e+01 3.339e+01 3.780e+01 5.864e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-10 11:41:47,188 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 11:41:47,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=533970.0, ans=0.1 2024-08-10 11:41:48,448 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 11:41:52,237 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 11:41:57,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=534070.0, ans=0.1 2024-08-10 11:42:01,643 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 11:42:05,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=534070.0, ans=0.2 2024-08-10 11:42:07,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=534170.0, ans=0.0 2024-08-10 11:42:16,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=534170.0, ans=0.0 2024-08-10 11:42:20,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 9950, loss[loss=0.1188, beats_loss=0.01118, ecapa_loss=0.0003367, whisper_loss=0.1042, over 22239.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01203, ecapa_loss=0.0002649, whisper_loss=0.09634, over 3884462.23 frames. ], batch size: 91, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:42:41,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=534370.0, ans=0.125 2024-08-10 11:42:54,557 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 11:42:56,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534470.0, ans=0.1 2024-08-10 11:42:59,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.55 vs. limit=5.0 2024-08-10 11:43:01,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=534570.0, ans=0.0 2024-08-10 11:43:01,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-10 11:43:16,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=534670.0, ans=0.125 2024-08-10 11:43:21,691 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 11:43:25,438 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10000, loss[loss=0.109, beats_loss=0.0129, ecapa_loss=0.000236, whisper_loss=0.09371, over 19525.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01204, ecapa_loss=0.0002636, whisper_loss=0.09629, over 3869530.79 frames. ], batch size: 76, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:43:27,989 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 11:43:43,512 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 11:43:46,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=534870.0, ans=0.0 2024-08-10 11:43:48,812 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 11:43:55,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.932e+01 3.270e+01 3.845e+01 5.958e+01, threshold=6.541e+01, percent-clipped=0.0 2024-08-10 11:44:02,551 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.993e-02 2024-08-10 11:44:07,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535070.0, ans=0.1 2024-08-10 11:44:30,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10050, loss[loss=0.1195, beats_loss=0.01093, ecapa_loss=0.0002104, whisper_loss=0.1064, over 23213.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01212, ecapa_loss=0.000262, whisper_loss=0.09507, over 3871816.22 frames. ], batch size: 90, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:44:32,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=535270.0, ans=0.125 2024-08-10 11:44:44,943 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-10 11:44:52,864 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 11:45:00,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2024-08-10 11:45:13,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535570.0, ans=0.1 2024-08-10 11:45:35,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10100, loss[loss=0.1105, beats_loss=0.01259, ecapa_loss=0.000299, whisper_loss=0.09488, over 21104.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01213, ecapa_loss=0.000265, whisper_loss=0.09502, over 3865791.82 frames. ], batch size: 89, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:45:43,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2024-08-10 11:45:54,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=535870.0, ans=0.125 2024-08-10 11:46:05,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.067e+01 3.527e+01 4.291e+01 1.159e+02, threshold=7.053e+01, percent-clipped=2.0 2024-08-10 11:46:15,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-10 11:46:19,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=536070.0, ans=0.2 2024-08-10 11:46:34,390 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 11:46:40,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=12.0 2024-08-10 11:46:40,420 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10150, loss[loss=0.1282, beats_loss=0.01018, ecapa_loss=0.0003104, whisper_loss=0.1149, over 22468.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01213, ecapa_loss=0.000265, whisper_loss=0.09471, over 3864297.02 frames. ], batch size: 88, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:46:49,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=22.5 2024-08-10 11:46:50,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=536270.0, ans=15.0 2024-08-10 11:46:57,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=536370.0, ans=0.125 2024-08-10 11:47:05,823 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 11:47:50,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=536670.0, ans=0.125 2024-08-10 11:47:50,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=536670.0, ans=0.125 2024-08-10 11:47:57,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10200, loss[loss=0.1051, beats_loss=0.01243, ecapa_loss=0.0002015, whisper_loss=0.09064, over 21138.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01206, ecapa_loss=0.0002633, whisper_loss=0.09526, over 3902541.10 frames. ], batch size: 82, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:48:03,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=536770.0, ans=0.125 2024-08-10 11:48:09,785 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 11:48:20,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=536870.0, ans=0.0 2024-08-10 11:48:32,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=536970.0, ans=0.0 2024-08-10 11:48:34,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 3.065e+01 3.411e+01 3.914e+01 6.071e+01, threshold=6.821e+01, percent-clipped=0.0 2024-08-10 11:48:35,336 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 15 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 11:49:06,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=537170.0, ans=0.125 2024-08-10 11:49:20,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10250, loss[loss=0.1166, beats_loss=0.0105, ecapa_loss=0.000368, whisper_loss=0.1024, over 21135.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01201, ecapa_loss=0.0002636, whisper_loss=0.09515, over 3862603.95 frames. ], batch size: 89, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:49:41,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2024-08-10 11:50:14,415 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 11:50:29,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=537670.0, ans=0.125 2024-08-10 11:50:37,794 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 11:50:44,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10300, loss[loss=0.1219, beats_loss=0.01228, ecapa_loss=0.0002674, whisper_loss=0.1069, over 17141.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01197, ecapa_loss=0.0002645, whisper_loss=0.09564, over 3882787.10 frames. ], batch size: 66, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:50:56,889 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 11:51:06,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2024-08-10 11:51:18,247 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 11:51:20,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.115e+01 3.523e+01 4.089e+01 1.199e+02, threshold=7.045e+01, percent-clipped=1.0 2024-08-10 11:51:26,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537970.0, ans=0.125 2024-08-10 11:51:40,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=538070.0, ans=0.0 2024-08-10 11:51:52,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=538170.0, ans=0.0 2024-08-10 11:51:53,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-10 11:51:59,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=538170.0, ans=0.1 2024-08-10 11:52:04,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10350, loss[loss=0.1206, beats_loss=0.01103, ecapa_loss=0.0002467, whisper_loss=0.1071, over 23554.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01201, ecapa_loss=0.0002632, whisper_loss=0.09547, over 3905230.45 frames. ], batch size: 92, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:52:18,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=538270.0, ans=0.2 2024-08-10 11:52:51,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538470.0, ans=0.1 2024-08-10 11:52:52,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=538570.0, ans=0.0 2024-08-10 11:53:16,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=538670.0, ans=0.2 2024-08-10 11:53:23,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2024-08-10 11:53:25,071 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10400, loss[loss=0.07966, beats_loss=0.01186, ecapa_loss=0.0002979, whisper_loss=0.06482, over 14290.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01201, ecapa_loss=0.0002624, whisper_loss=0.09555, over 3890716.37 frames. ], batch size: 60, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:53:26,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2024-08-10 11:54:01,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.892e+01 3.209e+01 3.631e+01 5.476e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 11:54:10,029 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 11:54:12,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=539070.0, ans=0.125 2024-08-10 11:54:12,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=539070.0, ans=0.0 2024-08-10 11:54:16,761 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 11:54:26,090 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 11:54:35,618 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 11:54:39,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=539170.0, ans=0.0 2024-08-10 11:54:43,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=539270.0, ans=0.0 2024-08-10 11:54:44,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10450, loss[loss=0.1144, beats_loss=0.01076, ecapa_loss=0.0002998, whisper_loss=0.1006, over 22027.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01201, ecapa_loss=0.0002623, whisper_loss=0.09582, over 3900043.86 frames. ], batch size: 90, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:54:48,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=539270.0, ans=0.125 2024-08-10 11:55:22,868 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-10 11:55:24,864 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 11:55:32,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.78 vs. limit=22.5 2024-08-10 11:55:49,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=539670.0, ans=0.125 2024-08-10 11:55:56,840 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 11:56:01,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10500, loss[loss=0.09957, beats_loss=0.01173, ecapa_loss=0.0003072, whisper_loss=0.08477, over 22249.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01202, ecapa_loss=0.0002615, whisper_loss=0.09614, over 3887538.56 frames. ], batch size: 93, lr: 1.45e-02, grad_scale: 2147483648.0 2024-08-10 11:56:06,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=539770.0, ans=0.0 2024-08-10 11:56:25,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539870.0, ans=0.125 2024-08-10 11:56:28,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=539970.0, ans=0.2 2024-08-10 11:56:33,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 3.038e+01 3.459e+01 3.996e+01 6.342e+01, threshold=6.919e+01, percent-clipped=0.0 2024-08-10 11:56:37,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.17 vs. limit=10.0 2024-08-10 11:56:39,940 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:56:41,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=539970.0, ans=0.0 2024-08-10 11:56:43,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=540070.0, ans=0.0 2024-08-10 11:57:09,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10550, loss[loss=0.1036, beats_loss=0.01418, ecapa_loss=0.0002354, whisper_loss=0.08706, over 23143.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01202, ecapa_loss=0.000263, whisper_loss=0.09588, over 3854328.81 frames. ], batch size: 91, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:57:15,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=540270.0, ans=0.0 2024-08-10 11:57:16,108 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 11:57:17,819 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-10 11:57:31,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=540370.0, ans=0.95 2024-08-10 11:57:32,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=540370.0, ans=0.125 2024-08-10 11:57:48,224 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 11:57:52,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=540570.0, ans=10.0 2024-08-10 11:57:57,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=540570.0, ans=15.0 2024-08-10 11:57:59,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=540570.0, ans=0.125 2024-08-10 11:58:02,873 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 11:58:09,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=540670.0, ans=0.2 2024-08-10 11:58:13,320 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 11:58:18,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10600, loss[loss=0.1213, beats_loss=0.008738, ecapa_loss=0.0003583, whisper_loss=0.109, over 16946.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01209, ecapa_loss=0.0002627, whisper_loss=0.09491, over 3869321.71 frames. ], batch size: 70, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:58:22,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-08-10 11:58:24,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=540770.0, ans=0.0 2024-08-10 11:58:26,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=540770.0, ans=0.2 2024-08-10 11:58:31,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2024-08-10 11:58:35,274 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 11:58:40,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-08-10 11:58:40,634 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 18 from LS+wenet, 34 from Vox, 42 fro AS 2024-08-10 11:58:44,653 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 11:58:44,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=540970.0, ans=0.125 2024-08-10 11:58:46,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=12.0 2024-08-10 11:58:49,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.966e+01 3.321e+01 3.773e+01 6.212e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:58:51,015 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 11:58:52,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=15.0 2024-08-10 11:58:54,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.07 vs. limit=15.0 2024-08-10 11:58:54,788 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 11:59:04,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-10 11:59:20,144 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 11:59:22,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-08-10 11:59:25,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10650, loss[loss=0.125, beats_loss=0.01005, ecapa_loss=0.0002419, whisper_loss=0.1126, over 20792.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01213, ecapa_loss=0.0002626, whisper_loss=0.09445, over 3852946.76 frames. ], batch size: 78, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:59:25,517 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-10 11:59:29,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=541270.0, ans=0.125 2024-08-10 11:59:39,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-10 11:59:40,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541370.0, ans=0.1 2024-08-10 11:59:41,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-08-10 11:59:59,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=541470.0, ans=0.125 2024-08-10 12:00:05,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=541570.0, ans=0.125 2024-08-10 12:00:11,466 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 12:00:11,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=541570.0, ans=0.05 2024-08-10 12:00:15,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=541570.0, ans=0.125 2024-08-10 12:00:31,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10700, loss[loss=0.1076, beats_loss=0.0114, ecapa_loss=0.0002479, whisper_loss=0.09375, over 22892.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0122, ecapa_loss=0.0002603, whisper_loss=0.09509, over 3863854.92 frames. ], batch size: 90, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:00:37,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=541770.0, ans=0.0 2024-08-10 12:00:41,654 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 12:00:43,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=541770.0, ans=0.0 2024-08-10 12:00:52,046 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 12:01:03,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+01 3.156e+01 3.555e+01 4.088e+01 6.627e+01, threshold=7.109e+01, percent-clipped=0.0 2024-08-10 12:01:08,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=541970.0, ans=0.2 2024-08-10 12:01:39,374 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10750, loss[loss=0.1175, beats_loss=0.01258, ecapa_loss=0.0002847, whisper_loss=0.1021, over 17865.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01222, ecapa_loss=0.000261, whisper_loss=0.09553, over 3898875.68 frames. ], batch size: 70, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:01:39,540 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 12:01:40,670 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 12:01:46,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=542270.0, ans=0.125 2024-08-10 12:01:50,072 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 12:01:50,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=542270.0, ans=0.0 2024-08-10 12:01:51,427 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 12:01:55,847 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 12:02:01,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=542370.0, ans=0.125 2024-08-10 12:02:02,863 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 12:02:13,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=542470.0, ans=0.125 2024-08-10 12:02:24,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-08-10 12:02:25,000 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 12:02:29,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-08-10 12:02:33,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=542670.0, ans=0.0 2024-08-10 12:02:34,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=542670.0, ans=0.125 2024-08-10 12:02:45,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10800, loss[loss=0.1166, beats_loss=0.009341, ecapa_loss=0.0003029, whisper_loss=0.1042, over 19244.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01217, ecapa_loss=0.0002596, whisper_loss=0.09605, over 3903198.87 frames. ], batch size: 77, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:02:55,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=542770.0, ans=0.125 2024-08-10 12:02:57,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.13 vs. limit=15.0 2024-08-10 12:03:00,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-08-10 12:03:06,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=542870.0, ans=0.125 2024-08-10 12:03:17,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.065e+01 3.589e+01 4.278e+01 6.968e+01, threshold=7.178e+01, percent-clipped=0.0 2024-08-10 12:03:20,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=542970.0, ans=0.125 2024-08-10 12:03:24,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=542970.0, ans=0.125 2024-08-10 12:03:29,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543070.0, ans=0.1 2024-08-10 12:03:32,796 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 12:03:34,129 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 12:03:38,020 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 12:03:53,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10850, loss[loss=0.1118, beats_loss=0.01161, ecapa_loss=0.000264, whisper_loss=0.09752, over 19376.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.0121, ecapa_loss=0.0002597, whisper_loss=0.09655, over 3922602.28 frames. ], batch size: 79, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:04:02,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=543270.0, ans=0.0 2024-08-10 12:04:12,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=543370.0, ans=0.2 2024-08-10 12:04:28,112 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 12:04:29,400 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 12:04:32,549 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 12:04:33,794 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 12:04:37,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=543570.0, ans=0.125 2024-08-10 12:04:40,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=543570.0, ans=0.0 2024-08-10 12:04:48,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=543670.0, ans=0.2 2024-08-10 12:05:03,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10900, loss[loss=0.1131, beats_loss=0.01113, ecapa_loss=0.0002867, whisper_loss=0.09911, over 23605.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01209, ecapa_loss=0.0002593, whisper_loss=0.09658, over 3953937.96 frames. ], batch size: 93, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:05:30,636 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 12:05:34,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=543970.0, ans=0.125 2024-08-10 12:05:35,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 3.064e+01 3.469e+01 3.864e+01 6.688e+01, threshold=6.938e+01, percent-clipped=0.0 2024-08-10 12:05:37,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=543970.0, ans=0.0 2024-08-10 12:05:39,876 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 12:05:40,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=543970.0, ans=0.0 2024-08-10 12:06:09,650 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 12:06:09,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=544170.0, ans=0.0 2024-08-10 12:06:13,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 10950, loss[loss=0.1051, beats_loss=0.0136, ecapa_loss=0.0002759, whisper_loss=0.08877, over 21461.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01211, ecapa_loss=0.0002584, whisper_loss=0.09673, over 3967842.62 frames. ], batch size: 87, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:06:19,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544270.0, ans=0.1 2024-08-10 12:06:31,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=544370.0, ans=0.125 2024-08-10 12:06:49,919 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 12:06:51,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=544470.0, ans=0.125 2024-08-10 12:06:59,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=544570.0, ans=0.0 2024-08-10 12:07:03,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-08-10 12:07:08,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=544670.0, ans=0.0 2024-08-10 12:07:11,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=544670.0, ans=0.125 2024-08-10 12:07:15,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-10 12:07:16,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=544670.0, ans=0.2 2024-08-10 12:07:16,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.35 vs. limit=22.5 2024-08-10 12:07:19,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11000, loss[loss=0.1149, beats_loss=0.01223, ecapa_loss=0.0002631, whisper_loss=0.09999, over 21158.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01204, ecapa_loss=0.0002595, whisper_loss=0.0966, over 3952135.32 frames. ], batch size: 84, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:07:21,758 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:07:25,200 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 12:07:37,042 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 12:07:39,681 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 40 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 12:07:50,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 2.903e+01 3.309e+01 3.802e+01 5.297e+01, threshold=6.618e+01, percent-clipped=0.0 2024-08-10 12:07:56,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=544970.0, ans=0.125 2024-08-10 12:07:59,521 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 12:08:09,883 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 12:08:17,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=545170.0, ans=0.05 2024-08-10 12:08:24,532 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 12:08:25,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11050, loss[loss=0.1112, beats_loss=0.01085, ecapa_loss=0.0002538, whisper_loss=0.09782, over 18138.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01201, ecapa_loss=0.0002605, whisper_loss=0.09688, over 3961627.35 frames. ], batch size: 74, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:08:34,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=545270.0, ans=0.125 2024-08-10 12:08:55,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=12.0 2024-08-10 12:09:00,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=545470.0, ans=0.0 2024-08-10 12:09:04,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=545570.0, ans=0.125 2024-08-10 12:09:06,683 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 12:09:16,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=545570.0, ans=0.0 2024-08-10 12:09:23,563 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 12:09:26,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=545670.0, ans=0.2 2024-08-10 12:09:27,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545670.0, ans=0.1 2024-08-10 12:09:31,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11100, loss[loss=0.1288, beats_loss=0.01346, ecapa_loss=0.0002388, whisper_loss=0.113, over 22633.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01208, ecapa_loss=0.0002596, whisper_loss=0.09617, over 3966715.59 frames. ], batch size: 88, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:09:33,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=545770.0, ans=0.0 2024-08-10 12:10:02,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.025e+01 3.487e+01 4.357e+01 7.811e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:10:05,290 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 12:10:20,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=546070.0, ans=0.125 2024-08-10 12:10:25,133 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 12:10:29,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546170.0, ans=0.1 2024-08-10 12:10:33,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=546170.0, ans=0.07 2024-08-10 12:10:35,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=546170.0, ans=0.2 2024-08-10 12:10:36,051 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 12:10:38,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11150, loss[loss=0.1039, beats_loss=0.01474, ecapa_loss=0.0002112, whisper_loss=0.08705, over 15049.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01203, ecapa_loss=0.0002587, whisper_loss=0.09624, over 3902145.86 frames. ], batch size: 58, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:10:40,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=546270.0, ans=0.0 2024-08-10 12:10:40,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=546270.0, ans=0.5 2024-08-10 12:10:57,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=546370.0, ans=0.0 2024-08-10 12:11:01,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=546370.0, ans=0.125 2024-08-10 12:11:27,406 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 12:11:27,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=546570.0, ans=0.125 2024-08-10 12:11:44,780 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11200, loss[loss=0.1209, beats_loss=0.01228, ecapa_loss=0.0002444, whisper_loss=0.1062, over 22838.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01203, ecapa_loss=0.0002584, whisper_loss=0.09669, over 3916564.22 frames. ], batch size: 89, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:11:47,476 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 12:11:47,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=546770.0, ans=0.5 2024-08-10 12:11:54,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=546770.0, ans=0.09899494936611666 2024-08-10 12:12:04,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=546870.0, ans=0.1 2024-08-10 12:12:15,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 3.126e+01 3.422e+01 3.938e+01 7.786e+01, threshold=6.843e+01, percent-clipped=1.0 2024-08-10 12:12:21,956 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 27 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 12:12:24,699 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 12:12:26,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=547070.0, ans=0.0 2024-08-10 12:12:41,363 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 12:12:51,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11250, loss[loss=0.1165, beats_loss=0.0105, ecapa_loss=0.0002368, whisper_loss=0.1036, over 17313.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01194, ecapa_loss=0.0002595, whisper_loss=0.09734, over 3916195.26 frames. ], batch size: 66, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:12:52,303 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:12:54,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=547270.0, ans=0.125 2024-08-10 12:12:57,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=547270.0, ans=0.09899494936611666 2024-08-10 12:13:00,241 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 12:13:02,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=547270.0, ans=0.125 2024-08-10 12:13:22,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.08 vs. limit=15.0 2024-08-10 12:13:35,021 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 12:13:38,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=547570.0, ans=0.1 2024-08-10 12:13:39,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=547570.0, ans=0.125 2024-08-10 12:13:41,754 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 12:13:45,810 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 12:13:48,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=547670.0, ans=0.125 2024-08-10 12:13:51,447 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.219e-01 2024-08-10 12:13:52,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.05 vs. limit=15.0 2024-08-10 12:13:58,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11300, loss[loss=0.1209, beats_loss=0.01048, ecapa_loss=0.0002637, whisper_loss=0.1078, over 22610.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01187, ecapa_loss=0.00026, whisper_loss=0.09715, over 3892926.83 frames. ], batch size: 87, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:14:00,653 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 12:14:15,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=547870.0, ans=0.125 2024-08-10 12:14:19,626 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 12:14:26,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=547970.0, ans=0.0 2024-08-10 12:14:26,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=547970.0, ans=0.07 2024-08-10 12:14:30,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.068e+01 3.483e+01 4.119e+01 9.369e+01, threshold=6.966e+01, percent-clipped=1.0 2024-08-10 12:14:37,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=12.0 2024-08-10 12:14:40,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=548070.0, ans=0.0 2024-08-10 12:14:48,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=548070.0, ans=0.125 2024-08-10 12:14:58,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=548170.0, ans=0.125 2024-08-10 12:15:05,736 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11350, loss[loss=0.1164, beats_loss=0.01216, ecapa_loss=0.0003451, whisper_loss=0.1008, over 21313.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01176, ecapa_loss=0.0002617, whisper_loss=0.0974, over 3907310.16 frames. ], batch size: 89, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:15:08,714 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.243e-01 2024-08-10 12:15:09,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548270.0, ans=0.1 2024-08-10 12:15:13,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548270.0, ans=0.1 2024-08-10 12:15:31,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.59 vs. limit=5.0 2024-08-10 12:15:40,417 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 12:15:44,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548570.0, ans=0.1 2024-08-10 12:15:54,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=548570.0, ans=0.2 2024-08-10 12:15:54,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=548570.0, ans=10.0 2024-08-10 12:16:11,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11400, loss[loss=0.08021, beats_loss=0.01551, ecapa_loss=0.0002947, whisper_loss=0.06176, over 16996.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01182, ecapa_loss=0.0002609, whisper_loss=0.09837, over 3917853.53 frames. ], batch size: 74, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:16:12,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=548770.0, ans=0.125 2024-08-10 12:16:19,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=548770.0, ans=0.0 2024-08-10 12:16:24,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=548870.0, ans=0.09899494936611666 2024-08-10 12:16:29,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=548870.0, ans=0.125 2024-08-10 12:16:31,976 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 12:16:42,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.940e+01 3.301e+01 3.929e+01 5.377e+01, threshold=6.601e+01, percent-clipped=0.0 2024-08-10 12:16:59,854 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-10 12:17:00,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=549070.0, ans=0.2 2024-08-10 12:17:05,236 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 12:17:05,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=549170.0, ans=0.0 2024-08-10 12:17:05,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-10 12:17:08,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=549170.0, ans=0.125 2024-08-10 12:17:09,314 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 12:17:12,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-08-10 12:17:18,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11450, loss[loss=0.1168, beats_loss=0.01385, ecapa_loss=0.0002336, whisper_loss=0.1006, over 22259.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01183, ecapa_loss=0.0002602, whisper_loss=0.09807, over 3896739.86 frames. ], batch size: 90, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:17:18,852 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 12:17:28,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=549270.0, ans=0.125 2024-08-10 12:17:30,752 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 12:17:40,262 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 12:17:55,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=22.5 2024-08-10 12:18:08,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=549570.0, ans=0.0 2024-08-10 12:18:08,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=549570.0, ans=0.125 2024-08-10 12:18:24,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=549670.0, ans=0.0 2024-08-10 12:18:26,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11500, loss[loss=0.1174, beats_loss=0.0114, ecapa_loss=0.0002674, whisper_loss=0.1033, over 20936.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01192, ecapa_loss=0.0002586, whisper_loss=0.09806, over 3928341.93 frames. ], batch size: 84, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:18:38,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=549870.0, ans=0.125 2024-08-10 12:18:45,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=549870.0, ans=0.0 2024-08-10 12:18:51,053 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-10 12:18:56,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.942e+01 3.473e+01 3.989e+01 7.170e+01, threshold=6.945e+01, percent-clipped=1.0 2024-08-10 12:19:32,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11550, loss[loss=0.1032, beats_loss=0.01491, ecapa_loss=0.0002245, whisper_loss=0.086, over 23134.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01192, ecapa_loss=0.0002587, whisper_loss=0.09742, over 3904829.10 frames. ], batch size: 91, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:19:32,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=550270.0, ans=0.125 2024-08-10 12:19:47,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=550370.0, ans=0.125 2024-08-10 12:19:59,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=550470.0, ans=0.125 2024-08-10 12:20:03,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=550470.0, ans=0.04949747468305833 2024-08-10 12:20:13,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550570.0, ans=0.125 2024-08-10 12:20:24,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=550670.0, ans=0.025 2024-08-10 12:20:38,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11600, loss[loss=0.1209, beats_loss=0.01036, ecapa_loss=0.0003135, whisper_loss=0.1074, over 15934.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01189, ecapa_loss=0.0002593, whisper_loss=0.09698, over 3902447.58 frames. ], batch size: 66, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:20:38,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550770.0, ans=0.1 2024-08-10 12:20:43,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.96 vs. limit=22.5 2024-08-10 12:21:07,452 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 12:21:08,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.001e+01 3.473e+01 4.016e+01 7.053e+01, threshold=6.947e+01, percent-clipped=1.0 2024-08-10 12:21:09,968 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 12:21:13,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=550970.0, ans=0.125 2024-08-10 12:21:22,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=551070.0, ans=0.2 2024-08-10 12:21:25,721 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 12:21:46,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11650, loss[loss=0.138, beats_loss=0.01009, ecapa_loss=0.0002438, whisper_loss=0.1254, over 23053.00 frames. ], tot_loss[loss=0.112, beats_loss=0.0119, ecapa_loss=0.000259, whisper_loss=0.09747, over 3926175.10 frames. ], batch size: 91, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:22:04,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551370.0, ans=0.125 2024-08-10 12:22:31,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=551570.0, ans=0.125 2024-08-10 12:22:57,802 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11700, loss[loss=0.1247, beats_loss=0.01026, ecapa_loss=0.0003558, whisper_loss=0.1109, over 21420.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01187, ecapa_loss=0.0002593, whisper_loss=0.09765, over 3908444.99 frames. ], batch size: 91, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:23:00,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=15.0 2024-08-10 12:23:07,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-10 12:23:10,457 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.442e-02 2024-08-10 12:23:17,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=551870.0, ans=0.0 2024-08-10 12:23:18,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551870.0, ans=0.1 2024-08-10 12:23:21,316 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 12:23:33,560 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 3.233e+01 3.487e+01 4.046e+01 6.995e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:23:43,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=552070.0, ans=0.2 2024-08-10 12:23:45,145 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 12:23:59,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552170.0, ans=0.1 2024-08-10 12:24:03,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=552170.0, ans=0.0 2024-08-10 12:24:06,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=552170.0, ans=0.125 2024-08-10 12:24:07,603 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 12:24:13,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11750, loss[loss=0.09709, beats_loss=0.01204, ecapa_loss=0.0002655, whisper_loss=0.0824, over 20970.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01192, ecapa_loss=0.0002591, whisper_loss=0.09683, over 3903051.88 frames. ], batch size: 85, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:24:26,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=552270.0, ans=0.125 2024-08-10 12:24:27,927 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 12:24:32,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=552370.0, ans=0.2 2024-08-10 12:24:36,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=552370.0, ans=0.125 2024-08-10 12:24:49,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=552470.0, ans=0.0 2024-08-10 12:24:52,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=552470.0, ans=0.0 2024-08-10 12:24:54,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=552470.0, ans=0.125 2024-08-10 12:24:56,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=552470.0, ans=0.2 2024-08-10 12:25:11,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=552570.0, ans=0.125 2024-08-10 12:25:29,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11800, loss[loss=0.1077, beats_loss=0.009339, ecapa_loss=0.0003207, whisper_loss=0.09513, over 13653.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01202, ecapa_loss=0.0002585, whisper_loss=0.09656, over 3882091.63 frames. ], batch size: 55, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:25:40,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.02 vs. limit=15.0 2024-08-10 12:26:04,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.876e+01 3.467e+01 4.028e+01 7.288e+01, threshold=6.933e+01, percent-clipped=1.0 2024-08-10 12:26:17,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=553070.0, ans=0.2 2024-08-10 12:26:19,619 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 12:26:44,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11850, loss[loss=0.1024, beats_loss=0.01084, ecapa_loss=0.0002794, whisper_loss=0.08878, over 15749.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01203, ecapa_loss=0.0002592, whisper_loss=0.09632, over 3882957.00 frames. ], batch size: 63, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:26:47,814 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-10 12:26:53,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=553270.0, ans=0.125 2024-08-10 12:26:55,496 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 12:26:57,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.59 vs. limit=15.0 2024-08-10 12:27:04,075 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 12:27:15,347 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 12:27:40,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=553570.0, ans=0.05 2024-08-10 12:27:53,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=553670.0, ans=0.125 2024-08-10 12:27:57,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11900, loss[loss=0.1104, beats_loss=0.01256, ecapa_loss=0.0002758, whisper_loss=0.09505, over 17716.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01201, ecapa_loss=0.0002587, whisper_loss=0.09692, over 3908771.43 frames. ], batch size: 75, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:28:00,438 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 12:28:09,104 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-10 12:28:17,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-10 12:28:19,713 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 12:28:22,167 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 12:28:24,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=553870.0, ans=0.125 2024-08-10 12:28:27,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=553970.0, ans=0.2 2024-08-10 12:28:30,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.028e+01 3.380e+01 3.794e+01 5.730e+01, threshold=6.759e+01, percent-clipped=0.0 2024-08-10 12:28:35,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=553970.0, ans=0.125 2024-08-10 12:28:42,906 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 12:29:06,351 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 12:29:10,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 11950, loss[loss=0.09919, beats_loss=0.01207, ecapa_loss=0.0003225, whisper_loss=0.08389, over 17899.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.012, ecapa_loss=0.0002593, whisper_loss=0.09631, over 3881249.82 frames. ], batch size: 71, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:29:11,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=554270.0, ans=0.035 2024-08-10 12:29:11,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=554270.0, ans=0.02 2024-08-10 12:29:26,040 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 12:29:46,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=554470.0, ans=0.2 2024-08-10 12:29:48,506 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 12:30:02,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=554570.0, ans=0.125 2024-08-10 12:30:08,191 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 12:30:22,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12000, loss[loss=0.09952, beats_loss=0.01312, ecapa_loss=0.0002175, whisper_loss=0.08423, over 14447.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01195, ecapa_loss=0.0002594, whisper_loss=0.09739, over 3913181.60 frames. ], batch size: 57, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:30:22,881 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 12:30:59,956 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on ASR_libri: loss=0.2637, beats_loss=0, ecapa_loss=0.0007919, whisper_loss=0.2558, over 922467.00 frames. 2024-08-10 12:31:17,051 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on SV_voxceleb1: loss=0.006895, beats_loss=0, ecapa_loss=0.0006895, whisper_loss=0, over 939242.00 frames. 2024-08-10 12:32:02,058 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8952, 1.4007, 1.9178, 2.0966], device='cuda:3') 2024-08-10 12:32:51,753 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6063, 1.4530, 1.4932, 1.0747], device='cuda:3') 2024-08-10 12:33:04,224 INFO [train_multi_KD3.py:1149] (3/4) Epoch 4, validation on AT_audioset: loss=0.02758, beats_loss=0.02758, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 12:33:04,228 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 12:33:13,802 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 12:33:20,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2024-08-10 12:33:39,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.069e+01 3.344e+01 4.078e+01 6.277e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 12:33:53,875 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 12:34:19,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=555170.0, ans=0.0 2024-08-10 12:34:23,398 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 12:34:24,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12050, loss[loss=0.1, beats_loss=0.01103, ecapa_loss=0.0002054, whisper_loss=0.08696, over 14888.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01196, ecapa_loss=0.0002594, whisper_loss=0.09689, over 3887871.02 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:34:30,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555270.0, ans=0.1 2024-08-10 12:34:43,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2024-08-10 12:35:02,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=555470.0, ans=0.04949747468305833 2024-08-10 12:35:15,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=555570.0, ans=0.0 2024-08-10 12:35:33,877 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 12:35:37,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=555670.0, ans=0.0 2024-08-10 12:35:49,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12100, loss[loss=0.1111, beats_loss=0.01121, ecapa_loss=0.0002508, whisper_loss=0.09737, over 15003.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01194, ecapa_loss=0.0002603, whisper_loss=0.0969, over 3887433.31 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:36:00,562 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 12:36:04,036 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 12:36:09,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=555870.0, ans=0.125 2024-08-10 12:36:15,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=555870.0, ans=0.04949747468305833 2024-08-10 12:36:18,555 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.614e-03 2024-08-10 12:36:30,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.425e+01 2.918e+01 3.194e+01 3.735e+01 7.690e+01, threshold=6.389e+01, percent-clipped=2.0 2024-08-10 12:36:31,103 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 12:36:35,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555970.0, ans=0.1 2024-08-10 12:36:46,542 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 12:36:49,604 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 12:36:57,380 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 12:36:57,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=556170.0, ans=0.04949747468305833 2024-08-10 12:36:58,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-10 12:37:03,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=556170.0, ans=0.0 2024-08-10 12:37:15,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12150, loss[loss=0.1122, beats_loss=0.01195, ecapa_loss=0.0002714, whisper_loss=0.09756, over 22325.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01199, ecapa_loss=0.0002601, whisper_loss=0.09611, over 3887190.62 frames. ], batch size: 92, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:37:18,275 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 12:37:30,965 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 12:37:48,125 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 12:37:54,518 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 12:38:34,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12200, loss[loss=0.09024, beats_loss=0.01353, ecapa_loss=0.0002664, whisper_loss=0.07405, over 22109.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01197, ecapa_loss=0.0002578, whisper_loss=0.09641, over 3875484.33 frames. ], batch size: 91, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:38:36,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=556770.0, ans=0.125 2024-08-10 12:38:40,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=22.5 2024-08-10 12:39:03,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=556870.0, ans=0.09899494936611666 2024-08-10 12:39:05,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-10 12:39:10,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.794e+01 3.177e+01 3.515e+01 6.137e+01, threshold=6.354e+01, percent-clipped=0.0 2024-08-10 12:39:12,629 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 12:39:21,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=557070.0, ans=0.125 2024-08-10 12:39:23,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=557070.0, ans=0.05 2024-08-10 12:39:40,159 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 12:39:54,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12250, loss[loss=0.1281, beats_loss=0.01075, ecapa_loss=0.0002515, whisper_loss=0.1149, over 23332.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01191, ecapa_loss=0.0002569, whisper_loss=0.09647, over 3893366.21 frames. ], batch size: 89, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:40:02,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-10 12:40:22,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=557370.0, ans=0.0 2024-08-10 12:40:25,558 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 12:40:43,020 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 12:40:53,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=557570.0, ans=0.125 2024-08-10 12:41:00,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=557670.0, ans=10.0 2024-08-10 12:41:05,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=557670.0, ans=0.0 2024-08-10 12:41:08,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=557670.0, ans=0.125 2024-08-10 12:41:14,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12300, loss[loss=0.111, beats_loss=0.01201, ecapa_loss=0.0002531, whisper_loss=0.09645, over 14129.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01194, ecapa_loss=0.0002578, whisper_loss=0.09627, over 3900913.75 frames. ], batch size: 55, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:41:17,934 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 12:41:21,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=557770.0, ans=0.0 2024-08-10 12:41:29,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=557870.0, ans=0.125 2024-08-10 12:41:36,240 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 12:41:42,162 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 12:41:49,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.037e+01 3.524e+01 3.995e+01 1.053e+02, threshold=7.048e+01, percent-clipped=4.0 2024-08-10 12:41:53,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=557970.0, ans=0.0 2024-08-10 12:42:10,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=558070.0, ans=0.0 2024-08-10 12:42:13,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=12.0 2024-08-10 12:42:22,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=558170.0, ans=0.0 2024-08-10 12:42:30,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558170.0, ans=0.1 2024-08-10 12:42:33,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12350, loss[loss=0.1187, beats_loss=0.01056, ecapa_loss=0.0002699, whisper_loss=0.1055, over 16896.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01198, ecapa_loss=0.0002615, whisper_loss=0.09642, over 3886628.02 frames. ], batch size: 65, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:42:34,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558270.0, ans=0.1 2024-08-10 12:42:47,386 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 12:42:55,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=558370.0, ans=0.0 2024-08-10 12:43:00,255 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 12:43:07,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=10.0 2024-08-10 12:43:17,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=558470.0, ans=0.2 2024-08-10 12:43:29,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=558570.0, ans=0.125 2024-08-10 12:43:31,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=558570.0, ans=0.0 2024-08-10 12:43:37,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=558570.0, ans=15.0 2024-08-10 12:43:43,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-10 12:43:49,145 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 12:43:49,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=558670.0, ans=0.0 2024-08-10 12:43:56,773 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 12:44:01,220 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12400, loss[loss=0.09389, beats_loss=0.0122, ecapa_loss=0.0002888, whisper_loss=0.0788, over 20701.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01197, ecapa_loss=0.0002623, whisper_loss=0.0962, over 3904277.31 frames. ], batch size: 86, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:44:07,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-10 12:44:07,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.11 vs. limit=22.5 2024-08-10 12:44:10,577 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 12:44:10,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=558770.0, ans=0.2 2024-08-10 12:44:12,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=558770.0, ans=0.0 2024-08-10 12:44:15,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=558770.0, ans=0.0 2024-08-10 12:44:17,432 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 12:44:41,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.951e+01 3.310e+01 3.895e+01 5.650e+01, threshold=6.619e+01, percent-clipped=0.0 2024-08-10 12:44:45,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=558970.0, ans=0.125 2024-08-10 12:44:52,404 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.374e+00 2024-08-10 12:44:53,733 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 12:44:56,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=559070.0, ans=0.125 2024-08-10 12:45:24,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=559170.0, ans=0.125 2024-08-10 12:45:26,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12450, loss[loss=0.084, beats_loss=0.0139, ecapa_loss=0.0002397, whisper_loss=0.0677, over 20770.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01197, ecapa_loss=0.0002619, whisper_loss=0.09573, over 3910064.28 frames. ], batch size: 84, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:45:31,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=559270.0, ans=0.125 2024-08-10 12:45:33,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=559270.0, ans=0.125 2024-08-10 12:46:01,509 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 12:46:04,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=559470.0, ans=0.125 2024-08-10 12:46:05,515 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 12:46:45,702 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12500, loss[loss=0.1234, beats_loss=0.009899, ecapa_loss=0.0002665, whisper_loss=0.1108, over 18890.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01199, ecapa_loss=0.000261, whisper_loss=0.09595, over 3904120.84 frames. ], batch size: 74, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:46:50,131 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 12:46:52,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-08-10 12:47:24,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=559970.0, ans=0.1 2024-08-10 12:47:28,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=559970.0, ans=0.125 2024-08-10 12:47:28,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 3.210e+01 3.616e+01 4.037e+01 8.521e+01, threshold=7.231e+01, percent-clipped=2.0 2024-08-10 12:47:29,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=559970.0, ans=0.125 2024-08-10 12:47:32,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2024-08-10 12:47:39,477 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 12:47:42,237 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 12:47:42,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=560070.0, ans=0.07 2024-08-10 12:47:45,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=560070.0, ans=0.125 2024-08-10 12:47:58,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=560170.0, ans=0.125 2024-08-10 12:48:03,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2024-08-10 12:48:12,579 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12550, loss[loss=0.1227, beats_loss=0.01126, ecapa_loss=0.0002036, whisper_loss=0.1094, over 22605.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01205, ecapa_loss=0.0002598, whisper_loss=0.09613, over 3900268.66 frames. ], batch size: 87, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:48:26,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=560270.0, ans=0.125 2024-08-10 12:48:50,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2024-08-10 12:48:53,633 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.927e-01 2024-08-10 12:49:02,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=560570.0, ans=0.125 2024-08-10 12:49:10,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=560570.0, ans=0.125 2024-08-10 12:49:22,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=15.0 2024-08-10 12:49:27,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=560670.0, ans=0.0 2024-08-10 12:49:29,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12600, loss[loss=0.1025, beats_loss=0.009044, ecapa_loss=0.0004202, whisper_loss=0.08928, over 13125.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01213, ecapa_loss=0.0002613, whisper_loss=0.09548, over 3897326.18 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:49:36,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=560770.0, ans=0.125 2024-08-10 12:49:38,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=560770.0, ans=0.1 2024-08-10 12:49:39,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=560770.0, ans=0.0 2024-08-10 12:49:45,273 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 12:50:05,276 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 12:50:06,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.385e+01 3.110e+01 3.572e+01 4.096e+01 7.155e+01, threshold=7.143e+01, percent-clipped=0.0 2024-08-10 12:50:14,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=560970.0, ans=0.0 2024-08-10 12:50:31,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.11 vs. limit=15.0 2024-08-10 12:50:33,430 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 12:50:44,283 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 12:50:46,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12650, loss[loss=0.08871, beats_loss=0.01225, ecapa_loss=0.0002775, whisper_loss=0.07369, over 15658.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01209, ecapa_loss=0.0002604, whisper_loss=0.09604, over 3897887.03 frames. ], batch size: 65, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:50:51,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=561270.0, ans=0.125 2024-08-10 12:50:56,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=561270.0, ans=0.125 2024-08-10 12:50:59,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2024-08-10 12:51:07,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-10 12:51:10,180 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 12:51:15,094 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 12:51:16,505 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 12:51:17,906 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 22 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-10 12:51:53,585 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 12:52:08,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12700, loss[loss=0.1243, beats_loss=0.01345, ecapa_loss=0.000205, whisper_loss=0.1088, over 22712.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01208, ecapa_loss=0.0002602, whisper_loss=0.09679, over 3921162.88 frames. ], batch size: 89, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:52:12,801 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 12:52:18,047 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 12:52:18,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-08-10 12:52:36,803 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 12:52:39,568 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 12:52:41,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=561970.0, ans=0.125 2024-08-10 12:52:44,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.862e+01 3.101e+01 3.673e+01 6.463e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-10 12:52:48,280 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 12:52:48,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=561970.0, ans=0.5 2024-08-10 12:53:00,038 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 12:53:02,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2024-08-10 12:53:11,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=562170.0, ans=0.125 2024-08-10 12:53:11,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=562170.0, ans=0.1 2024-08-10 12:53:13,126 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 12:53:25,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=562270.0, ans=0.0 2024-08-10 12:53:26,703 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12750, loss[loss=0.1363, beats_loss=0.009682, ecapa_loss=0.0002161, whisper_loss=0.1245, over 23197.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01203, ecapa_loss=0.0002598, whisper_loss=0.09702, over 3906948.52 frames. ], batch size: 83, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:53:28,247 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 12:53:29,615 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 12:53:32,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=562270.0, ans=0.125 2024-08-10 12:53:35,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=8.0 2024-08-10 12:53:44,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=562370.0, ans=0.2 2024-08-10 12:53:52,157 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 12:54:04,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=562470.0, ans=0.0 2024-08-10 12:54:17,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=562570.0, ans=0.125 2024-08-10 12:54:17,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=562570.0, ans=0.2 2024-08-10 12:54:24,076 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 12:54:36,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=562670.0, ans=0.0 2024-08-10 12:54:41,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=562770.0, ans=0.125 2024-08-10 12:54:42,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12800, loss[loss=0.1195, beats_loss=0.01002, ecapa_loss=0.0002747, whisper_loss=0.1068, over 22672.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01197, ecapa_loss=0.0002619, whisper_loss=0.09725, over 3913116.12 frames. ], batch size: 92, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:54:46,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2024-08-10 12:54:51,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=562770.0, ans=0.05 2024-08-10 12:54:52,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=12.0 2024-08-10 12:55:11,011 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-10 12:55:11,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=562870.0, ans=0.125 2024-08-10 12:55:17,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.950e+01 3.581e+01 4.034e+01 6.155e+01, threshold=7.162e+01, percent-clipped=0.0 2024-08-10 12:55:19,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=562970.0, ans=0.0 2024-08-10 12:55:21,128 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 12:55:30,844 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 12:55:48,144 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.181e-01 2024-08-10 12:55:55,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12850, loss[loss=0.1137, beats_loss=0.01237, ecapa_loss=0.0002068, whisper_loss=0.09924, over 19407.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01203, ecapa_loss=0.0002613, whisper_loss=0.09618, over 3884580.85 frames. ], batch size: 73, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:55:56,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-10 12:55:58,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=563270.0, ans=0.05 2024-08-10 12:55:58,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=563270.0, ans=0.125 2024-08-10 12:56:25,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-08-10 12:56:30,606 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 33 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 12:56:36,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-10 12:57:05,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12900, loss[loss=0.1141, beats_loss=0.01175, ecapa_loss=0.0002678, whisper_loss=0.09967, over 18776.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01202, ecapa_loss=0.0002621, whisper_loss=0.09605, over 3854567.37 frames. ], batch size: 77, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:57:38,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.781e+01 3.198e+01 3.852e+01 6.418e+01, threshold=6.396e+01, percent-clipped=0.0 2024-08-10 12:57:40,176 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 12:57:42,734 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 12:57:48,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=564070.0, ans=0.0 2024-08-10 12:57:48,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2024-08-10 12:57:51,883 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 12:57:58,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=564070.0, ans=0.015 2024-08-10 12:58:08,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=564170.0, ans=0.0 2024-08-10 12:58:15,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=564270.0, ans=0.1 2024-08-10 12:58:15,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 12950, loss[loss=0.0968, beats_loss=0.01361, ecapa_loss=0.000251, whisper_loss=0.08068, over 16905.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01204, ecapa_loss=0.0002633, whisper_loss=0.09567, over 3858583.06 frames. ], batch size: 69, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:58:23,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=564270.0, ans=0.125 2024-08-10 12:58:26,032 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 12:58:50,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-10 12:58:53,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-08-10 12:59:18,540 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 12:59:21,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=564670.0, ans=0.125 2024-08-10 12:59:23,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13000, loss[loss=0.1037, beats_loss=0.01351, ecapa_loss=0.0002693, whisper_loss=0.08745, over 20281.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01206, ecapa_loss=0.0002599, whisper_loss=0.09589, over 3883903.75 frames. ], batch size: 83, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:59:34,679 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 12:59:52,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564970.0, ans=0.1 2024-08-10 12:59:54,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.945e+01 3.366e+01 4.220e+01 5.870e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 13:00:02,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=564970.0, ans=0.125 2024-08-10 13:00:13,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=565070.0, ans=0.125 2024-08-10 13:00:24,449 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 13:00:24,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=565170.0, ans=0.0 2024-08-10 13:00:32,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13050, loss[loss=0.1243, beats_loss=0.01087, ecapa_loss=0.0002777, whisper_loss=0.1106, over 15409.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01208, ecapa_loss=0.0002596, whisper_loss=0.09517, over 3862232.83 frames. ], batch size: 61, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:00:41,795 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 13:00:44,669 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-10 13:00:55,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=565370.0, ans=0.125 2024-08-10 13:01:15,358 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 13:01:32,868 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 13:01:45,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565670.0, ans=0.1 2024-08-10 13:01:48,436 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13100, loss[loss=0.1028, beats_loss=0.01358, ecapa_loss=0.0002088, whisper_loss=0.08715, over 20230.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01217, ecapa_loss=0.0002576, whisper_loss=0.09476, over 3867180.00 frames. ], batch size: 79, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:02:07,684 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 13:02:26,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 3.014e+01 3.400e+01 3.856e+01 6.675e+01, threshold=6.801e+01, percent-clipped=0.0 2024-08-10 13:02:33,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=565970.0, ans=0.125 2024-08-10 13:02:45,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=566070.0, ans=0.2 2024-08-10 13:03:01,269 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 13:03:01,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=566170.0, ans=0.0 2024-08-10 13:03:06,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2024-08-10 13:03:10,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13150, loss[loss=0.1418, beats_loss=0.009683, ecapa_loss=0.0002065, whisper_loss=0.13, over 15643.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01215, ecapa_loss=0.000256, whisper_loss=0.09502, over 3833769.81 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:03:17,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=566270.0, ans=0.125 2024-08-10 13:03:18,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-08-10 13:03:21,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=566270.0, ans=0.0 2024-08-10 13:03:23,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=566270.0, ans=0.125 2024-08-10 13:03:31,021 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 13:03:37,672 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 13:04:10,034 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 13:04:11,382 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 13:04:31,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13200, loss[loss=0.1118, beats_loss=0.01151, ecapa_loss=0.0002946, whisper_loss=0.09738, over 22164.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0121, ecapa_loss=0.0002563, whisper_loss=0.09547, over 3848487.72 frames. ], batch size: 93, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:04:32,356 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 13:05:07,728 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.868e+01 3.486e+01 3.850e+01 5.808e+01, threshold=6.972e+01, percent-clipped=0.0 2024-08-10 13:05:08,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=566970.0, ans=0.2 2024-08-10 13:05:11,239 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 13:05:15,139 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.608e-02 2024-08-10 13:05:21,451 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 13:05:25,838 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 13:05:26,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=567070.0, ans=0.0 2024-08-10 13:05:29,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2024-08-10 13:05:49,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=567270.0, ans=0.125 2024-08-10 13:05:50,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13250, loss[loss=0.122, beats_loss=0.009167, ecapa_loss=0.0002987, whisper_loss=0.1099, over 17238.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01207, ecapa_loss=0.0002571, whisper_loss=0.09565, over 3850230.99 frames. ], batch size: 68, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:06:09,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567370.0, ans=0.1 2024-08-10 13:06:13,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=567370.0, ans=0.0 2024-08-10 13:06:18,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.26 vs. limit=22.5 2024-08-10 13:06:19,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=567370.0, ans=0.2 2024-08-10 13:07:02,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=567670.0, ans=0.95 2024-08-10 13:07:10,844 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 13:07:11,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567770.0, ans=0.1 2024-08-10 13:07:11,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13300, loss[loss=0.09312, beats_loss=0.01406, ecapa_loss=0.0002266, whisper_loss=0.0768, over 13276.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01205, ecapa_loss=0.0002566, whisper_loss=0.09524, over 3865573.40 frames. ], batch size: 53, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:07:12,091 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-10 13:07:38,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=567870.0, ans=0.125 2024-08-10 13:07:39,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=567870.0, ans=0.2 2024-08-10 13:07:48,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 3.028e+01 3.547e+01 3.970e+01 7.425e+01, threshold=7.095e+01, percent-clipped=1.0 2024-08-10 13:07:52,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=567970.0, ans=0.125 2024-08-10 13:07:58,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=568070.0, ans=0.0 2024-08-10 13:08:31,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=568270.0, ans=0.125 2024-08-10 13:08:31,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13350, loss[loss=0.1171, beats_loss=0.0113, ecapa_loss=0.0003074, whisper_loss=0.1027, over 16899.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01207, ecapa_loss=0.0002543, whisper_loss=0.09581, over 3856445.78 frames. ], batch size: 68, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:08:50,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568370.0, ans=0.1 2024-08-10 13:09:17,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568570.0, ans=0.1 2024-08-10 13:09:45,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568770.0, ans=0.1 2024-08-10 13:09:46,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13400, loss[loss=0.1381, beats_loss=0.009747, ecapa_loss=0.0002607, whisper_loss=0.1257, over 22514.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01199, ecapa_loss=0.0002556, whisper_loss=0.09546, over 3831128.15 frames. ], batch size: 89, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:09:56,566 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 13:09:56,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=568770.0, ans=0.2 2024-08-10 13:10:00,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=15.0 2024-08-10 13:10:13,462 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-10 13:10:18,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-10 13:10:18,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.910e+01 3.380e+01 3.958e+01 6.126e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-10 13:10:22,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=568970.0, ans=0.125 2024-08-10 13:10:42,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=569170.0, ans=0.0 2024-08-10 13:10:50,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.75 vs. limit=10.0 2024-08-10 13:10:56,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13450, loss[loss=0.1053, beats_loss=0.01253, ecapa_loss=0.0002314, whisper_loss=0.09046, over 13165.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01208, ecapa_loss=0.0002563, whisper_loss=0.09568, over 3850981.65 frames. ], batch size: 53, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:11:14,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2024-08-10 13:11:24,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=569470.0, ans=0.0 2024-08-10 13:11:29,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-10 13:11:35,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=569470.0, ans=0.0 2024-08-10 13:11:39,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=569570.0, ans=0.05 2024-08-10 13:11:40,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=569570.0, ans=0.125 2024-08-10 13:11:51,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=569670.0, ans=0.0 2024-08-10 13:12:01,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-10 13:12:04,054 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13500, loss[loss=0.1041, beats_loss=0.01164, ecapa_loss=0.0002008, whisper_loss=0.09045, over 18890.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01208, ecapa_loss=0.0002567, whisper_loss=0.09584, over 3852583.97 frames. ], batch size: 72, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:12:04,226 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 13:12:10,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569770.0, ans=0.1 2024-08-10 13:12:33,160 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 13:12:35,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-10 13:12:35,744 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+01 2.996e+01 3.434e+01 4.154e+01 6.721e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 13:12:37,118 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 13:12:38,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=569970.0, ans=0.0 2024-08-10 13:12:40,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2024-08-10 13:12:44,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=570070.0, ans=0.025 2024-08-10 13:12:45,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2024-08-10 13:12:54,353 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 13:13:03,576 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 13:13:11,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13550, loss[loss=0.1024, beats_loss=0.0113, ecapa_loss=0.0002812, whisper_loss=0.08833, over 18407.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01201, ecapa_loss=0.0002567, whisper_loss=0.09591, over 3872699.68 frames. ], batch size: 76, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:13:17,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=570270.0, ans=0.05 2024-08-10 13:13:27,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=570370.0, ans=0.125 2024-08-10 13:13:29,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570370.0, ans=0.1 2024-08-10 13:13:37,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=570470.0, ans=0.2 2024-08-10 13:13:51,145 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 13:13:53,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=570570.0, ans=0.2 2024-08-10 13:13:54,228 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 13:13:54,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=570570.0, ans=0.2 2024-08-10 13:14:03,089 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 13:14:13,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=570670.0, ans=0.125 2024-08-10 13:14:16,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13600, loss[loss=0.1248, beats_loss=0.01042, ecapa_loss=0.0003, whisper_loss=0.1113, over 22439.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01202, ecapa_loss=0.000255, whisper_loss=0.09577, over 3858172.88 frames. ], batch size: 94, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:14:18,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2024-08-10 13:14:27,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=570770.0, ans=0.1 2024-08-10 13:14:46,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=570970.0, ans=0.2 2024-08-10 13:14:47,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.840e+01 3.248e+01 3.798e+01 4.801e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 13:14:55,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571070.0, ans=0.1 2024-08-10 13:15:09,966 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 14 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 13:15:12,929 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-10 13:15:13,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2024-08-10 13:15:22,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13650, loss[loss=0.09841, beats_loss=0.0141, ecapa_loss=0.0002476, whisper_loss=0.08183, over 17743.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01207, ecapa_loss=0.0002559, whisper_loss=0.09643, over 3872764.70 frames. ], batch size: 75, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:15:23,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=571270.0, ans=0.05 2024-08-10 13:15:25,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=571270.0, ans=0.0 2024-08-10 13:15:25,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=571270.0, ans=0.125 2024-08-10 13:15:30,268 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 13:15:35,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2024-08-10 13:15:46,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=571370.0, ans=0.125 2024-08-10 13:16:04,283 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 13:16:05,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=571570.0, ans=0.125 2024-08-10 13:16:13,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571570.0, ans=0.1 2024-08-10 13:16:17,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=571670.0, ans=0.015 2024-08-10 13:16:22,623 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-10 13:16:25,297 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 13:16:30,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13700, loss[loss=0.1164, beats_loss=0.01255, ecapa_loss=0.0001831, whisper_loss=0.102, over 15082.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01204, ecapa_loss=0.0002549, whisper_loss=0.09705, over 3886006.99 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:16:38,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2024-08-10 13:16:40,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2024-08-10 13:17:01,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-10 13:17:01,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.889e+01 3.237e+01 4.000e+01 5.503e+01, threshold=6.474e+01, percent-clipped=0.0 2024-08-10 13:17:09,640 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 13:17:18,291 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:17:37,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=572270.0, ans=0.125 2024-08-10 13:17:38,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13750, loss[loss=0.09094, beats_loss=0.01393, ecapa_loss=0.0003484, whisper_loss=0.07352, over 18849.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01202, ecapa_loss=0.0002553, whisper_loss=0.0967, over 3907581.40 frames. ], batch size: 83, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:17:53,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2024-08-10 13:17:59,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=572370.0, ans=0.125 2024-08-10 13:18:03,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.42 vs. limit=22.5 2024-08-10 13:18:10,621 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 13:18:16,422 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 13:18:45,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=572770.0, ans=0.125 2024-08-10 13:18:46,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13800, loss[loss=0.1161, beats_loss=0.009978, ecapa_loss=0.0002843, whisper_loss=0.1033, over 19339.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01195, ecapa_loss=0.0002552, whisper_loss=0.09645, over 3898050.45 frames. ], batch size: 75, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:18:46,552 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 13:18:55,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=572770.0, ans=0.0 2024-08-10 13:19:08,717 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 13:19:18,111 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 2.990e+01 3.419e+01 4.092e+01 5.899e+01, threshold=6.838e+01, percent-clipped=0.0 2024-08-10 13:19:22,373 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 13:19:25,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=572970.0, ans=0.125 2024-08-10 13:19:29,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-10 13:19:43,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=573170.0, ans=0.0 2024-08-10 13:19:47,070 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-10 13:19:54,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13850, loss[loss=0.1037, beats_loss=0.01247, ecapa_loss=0.0002816, whisper_loss=0.08842, over 22370.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01194, ecapa_loss=0.0002527, whisper_loss=0.09667, over 3910481.75 frames. ], batch size: 92, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:19:56,418 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 13:20:07,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=573370.0, ans=0.0 2024-08-10 13:20:17,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-10 13:20:34,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-08-10 13:21:00,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=573670.0, ans=0.125 2024-08-10 13:21:03,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13900, loss[loss=0.13, beats_loss=0.01046, ecapa_loss=0.0002634, whisper_loss=0.1169, over 17313.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01196, ecapa_loss=0.0002524, whisper_loss=0.09659, over 3911368.89 frames. ], batch size: 68, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:21:05,375 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 13:21:23,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=573870.0, ans=0.125 2024-08-10 13:21:30,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573970.0, ans=0.125 2024-08-10 13:21:35,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 3.046e+01 3.391e+01 3.778e+01 5.936e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 13:21:47,240 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 13:22:13,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 13950, loss[loss=0.1047, beats_loss=0.009311, ecapa_loss=0.0003124, whisper_loss=0.09224, over 13278.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01187, ecapa_loss=0.0002522, whisper_loss=0.09663, over 3906172.80 frames. ], batch size: 54, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:22:17,426 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 13:22:18,784 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 13:22:34,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=574370.0, ans=0.125 2024-08-10 13:22:37,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=574370.0, ans=0.0 2024-08-10 13:22:58,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=574570.0, ans=0.0 2024-08-10 13:23:00,389 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 13:23:07,528 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 13:23:22,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14000, loss[loss=0.1259, beats_loss=0.009221, ecapa_loss=0.0002664, whisper_loss=0.1141, over 21592.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01191, ecapa_loss=0.0002515, whisper_loss=0.0964, over 3860324.78 frames. ], batch size: 84, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:23:24,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=22.5 2024-08-10 13:23:35,555 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 13:23:50,903 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 13:23:55,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.914e+01 3.235e+01 3.866e+01 6.339e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-10 13:23:57,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=574970.0, ans=0.2 2024-08-10 13:24:01,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=574970.0, ans=0.125 2024-08-10 13:24:02,434 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 13:24:05,087 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 13:24:12,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2024-08-10 13:24:16,047 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 13:24:17,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575170.0, ans=0.1 2024-08-10 13:24:19,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575170.0, ans=0.1 2024-08-10 13:24:26,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2024-08-10 13:24:33,125 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 13:24:34,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14050, loss[loss=0.1154, beats_loss=0.01174, ecapa_loss=0.0002797, whisper_loss=0.1009, over 18974.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01204, ecapa_loss=0.0002499, whisper_loss=0.09633, over 3867403.64 frames. ], batch size: 80, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:24:58,822 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 13:25:13,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=575470.0, ans=0.125 2024-08-10 13:25:32,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=575670.0, ans=0.125 2024-08-10 13:25:47,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14100, loss[loss=0.09026, beats_loss=0.01008, ecapa_loss=0.0003052, whisper_loss=0.07713, over 16767.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01206, ecapa_loss=0.0002517, whisper_loss=0.09614, over 3859810.50 frames. ], batch size: 69, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:26:22,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=575970.0, ans=0.0 2024-08-10 13:26:25,062 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.001e+01 3.648e+01 4.223e+01 8.641e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 13:26:27,306 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 13:26:35,668 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 13:26:38,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=576070.0, ans=0.0 2024-08-10 13:26:41,735 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 13:26:47,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=576070.0, ans=15.0 2024-08-10 13:26:59,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=576170.0, ans=0.1 2024-08-10 13:27:04,020 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 13:27:08,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14150, loss[loss=0.1031, beats_loss=0.01232, ecapa_loss=0.0002777, whisper_loss=0.08796, over 22054.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01207, ecapa_loss=0.0002526, whisper_loss=0.09643, over 3883147.98 frames. ], batch size: 92, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:27:14,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=576270.0, ans=0.0 2024-08-10 13:27:19,211 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 13:27:32,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=576370.0, ans=0.07 2024-08-10 13:27:47,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-08-10 13:27:51,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=576470.0, ans=0.125 2024-08-10 13:28:04,799 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 13:28:09,729 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 13:28:19,023 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-10 13:28:32,925 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14200, loss[loss=0.1273, beats_loss=0.009609, ecapa_loss=0.0002742, whisper_loss=0.115, over 20169.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01198, ecapa_loss=0.0002519, whisper_loss=0.0968, over 3888281.60 frames. ], batch size: 78, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:28:36,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=15.0 2024-08-10 13:28:43,493 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 13:28:45,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.17 vs. limit=15.0 2024-08-10 13:28:58,153 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 13:29:04,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-10 13:29:09,999 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 13:29:17,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 3.094e+01 3.432e+01 3.863e+01 7.530e+01, threshold=6.863e+01, percent-clipped=1.0 2024-08-10 13:30:09,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14250, loss[loss=0.1052, beats_loss=0.01054, ecapa_loss=0.0002309, whisper_loss=0.0924, over 14129.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01202, ecapa_loss=0.0002498, whisper_loss=0.0963, over 3854859.22 frames. ], batch size: 54, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:30:19,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=577270.0, ans=0.0 2024-08-10 13:30:29,491 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 13:30:32,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=577370.0, ans=0.125 2024-08-10 13:30:43,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=577370.0, ans=0.0 2024-08-10 13:30:48,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=577370.0, ans=0.0 2024-08-10 13:31:23,159 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-10 13:31:34,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-10 13:31:44,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=577670.0, ans=0.125 2024-08-10 13:31:55,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=577770.0, ans=0.125 2024-08-10 13:31:56,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14300, loss[loss=0.1067, beats_loss=0.01426, ecapa_loss=0.0002713, whisper_loss=0.08968, over 22886.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01197, ecapa_loss=0.0002513, whisper_loss=0.09685, over 3887313.32 frames. ], batch size: 94, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:32:29,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=577870.0, ans=0.125 2024-08-10 13:32:44,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.908e+01 3.226e+01 3.811e+01 6.354e+01, threshold=6.452e+01, percent-clipped=0.0 2024-08-10 13:32:53,355 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 13:32:53,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=577970.0, ans=0.07 2024-08-10 13:33:37,686 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:33:38,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-08-10 13:33:42,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14350, loss[loss=0.1104, beats_loss=0.01312, ecapa_loss=0.0002737, whisper_loss=0.09454, over 18212.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01199, ecapa_loss=0.000253, whisper_loss=0.09657, over 3893184.19 frames. ], batch size: 74, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:34:17,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=578470.0, ans=0.125 2024-08-10 13:34:32,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=578570.0, ans=0.125 2024-08-10 13:34:37,854 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 13:34:38,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-10 13:34:42,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=578670.0, ans=0.5 2024-08-10 13:34:47,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=578670.0, ans=0.125 2024-08-10 13:34:52,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14400, loss[loss=0.1122, beats_loss=0.01265, ecapa_loss=0.0002944, whisper_loss=0.09665, over 21922.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01198, ecapa_loss=0.0002533, whisper_loss=0.09673, over 3879587.57 frames. ], batch size: 91, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:34:53,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=578770.0, ans=0.125 2024-08-10 13:34:53,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-10 13:34:54,693 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 13:35:04,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=578770.0, ans=0.125 2024-08-10 13:35:11,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578870.0, ans=0.1 2024-08-10 13:35:24,965 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 3.244e+01 3.522e+01 4.448e+01 1.287e+02, threshold=7.043e+01, percent-clipped=5.0 2024-08-10 13:35:25,267 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 13:35:30,607 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 13:35:38,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=579070.0, ans=0.125 2024-08-10 13:35:38,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.64 vs. limit=10.0 2024-08-10 13:35:39,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=579070.0, ans=0.125 2024-08-10 13:35:48,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-10 13:36:02,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 4, batch 14450, loss[loss=0.1098, beats_loss=0.0114, ecapa_loss=0.0002752, whisper_loss=0.09565, over 19188.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01202, ecapa_loss=0.0002555, whisper_loss=0.09587, over 3889054.17 frames. ], batch size: 77, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:36:08,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=579270.0, ans=22.5 2024-08-10 13:36:26,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=579370.0, ans=0.125 2024-08-10 13:36:32,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=579470.0, ans=0.0 2024-08-10 13:36:40,717 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 13:36:48,477 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 13:36:54,973 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 13:37:44,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 0, loss[loss=0.1231, beats_loss=0.01286, ecapa_loss=0.0002323, whisper_loss=0.1079, over 24132.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01286, ecapa_loss=0.0002323, whisper_loss=0.1079, over 24132.00 frames. ], batch size: 94, lr: 1.31e-02, grad_scale: 8589934592.0 2024-08-10 13:37:44,861 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 13:38:27,677 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007699, whisper_loss=0.2545, over 922467.00 frames. 2024-08-10 13:38:42,863 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on SV_voxceleb1: loss=0.006763, beats_loss=0, ecapa_loss=0.0006763, whisper_loss=0, over 939242.00 frames. 2024-08-10 13:40:39,895 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on AT_audioset: loss=0.02719, beats_loss=0.02719, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 13:40:39,898 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 13:40:53,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=579720.0, ans=0.125 2024-08-10 13:41:18,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=579820.0, ans=0.2 2024-08-10 13:41:26,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=579820.0, ans=0.2 2024-08-10 13:41:29,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579820.0, ans=0.125 2024-08-10 13:41:36,590 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 13:41:53,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 3.049e+01 3.546e+01 4.164e+01 6.478e+01, threshold=7.092e+01, percent-clipped=0.0 2024-08-10 13:42:33,538 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-10 13:42:46,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 50, loss[loss=0.1275, beats_loss=0.01165, ecapa_loss=0.000246, whisper_loss=0.1134, over 23390.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.0118, ecapa_loss=0.0002555, whisper_loss=0.09716, over 916266.71 frames. ], batch size: 94, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:43:16,655 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-10 13:43:33,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=580420.0, ans=0.125 2024-08-10 13:43:40,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-10 13:43:45,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=580420.0, ans=0.025 2024-08-10 13:43:56,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=580520.0, ans=0.125 2024-08-10 13:44:00,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-08-10 13:44:07,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=580520.0, ans=0.0 2024-08-10 13:44:07,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=12.0 2024-08-10 13:44:27,148 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-10 13:44:36,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=580620.0, ans=0.04949747468305833 2024-08-10 13:44:41,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 100, loss[loss=0.1117, beats_loss=0.01098, ecapa_loss=0.0002675, whisper_loss=0.09804, over 17461.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01154, ecapa_loss=0.0002547, whisper_loss=0.09686, over 1566285.40 frames. ], batch size: 69, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:45:03,252 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 13:45:03,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=580820.0, ans=0.125 2024-08-10 13:45:05,734 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 13:45:20,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580820.0, ans=0.1 2024-08-10 13:45:33,864 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:45:36,079 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 13:45:38,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=580920.0, ans=0.125 2024-08-10 13:45:43,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+01 3.227e+01 3.615e+01 4.275e+01 6.139e+01, threshold=7.229e+01, percent-clipped=0.0 2024-08-10 13:46:01,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=581020.0, ans=0.0 2024-08-10 13:46:16,639 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 13:46:26,352 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 13:46:27,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 150, loss[loss=0.1456, beats_loss=0.01037, ecapa_loss=0.000285, whisper_loss=0.1324, over 20803.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01149, ecapa_loss=0.0002546, whisper_loss=0.0964, over 2066190.64 frames. ], batch size: 81, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:46:57,369 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 13:47:13,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2024-08-10 13:47:16,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=581520.0, ans=0.0 2024-08-10 13:47:33,783 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-10 13:47:40,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=581620.0, ans=0.02 2024-08-10 13:47:46,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 200, loss[loss=0.09562, beats_loss=0.009261, ecapa_loss=0.0002964, whisper_loss=0.08339, over 17763.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01161, ecapa_loss=0.0002542, whisper_loss=0.0959, over 2431317.15 frames. ], batch size: 71, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:47:47,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=581720.0, ans=0.0 2024-08-10 13:47:48,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=581720.0, ans=0.0 2024-08-10 13:48:11,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=12.0 2024-08-10 13:48:14,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=581820.0, ans=0.125 2024-08-10 13:48:24,567 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 13:48:24,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=581920.0, ans=0.2 2024-08-10 13:48:28,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 3.079e+01 3.499e+01 4.044e+01 6.352e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-10 13:48:31,930 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 13:48:54,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2024-08-10 13:49:01,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 250, loss[loss=0.1125, beats_loss=0.01089, ecapa_loss=0.0002452, whisper_loss=0.09912, over 15324.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01162, ecapa_loss=0.0002508, whisper_loss=0.096, over 2725848.15 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:49:18,469 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.082e+05 2024-08-10 13:49:22,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=582320.0, ans=0.0 2024-08-10 13:49:25,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=582320.0, ans=0.125 2024-08-10 13:49:48,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=582520.0, ans=0.125 2024-08-10 13:49:48,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2024-08-10 13:50:16,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 300, loss[loss=0.1285, beats_loss=0.0107, ecapa_loss=0.00029, whisper_loss=0.1149, over 14377.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01174, ecapa_loss=0.0002501, whisper_loss=0.09491, over 2965277.70 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:50:21,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=582720.0, ans=0.125 2024-08-10 13:50:27,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582720.0, ans=0.1 2024-08-10 13:50:28,456 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 13:50:45,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=582920.0, ans=0.125 2024-08-10 13:50:45,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-10 13:50:49,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=582920.0, ans=0.2 2024-08-10 13:50:58,482 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.973e+01 3.374e+01 4.127e+01 8.161e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 13:51:02,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=583020.0, ans=0.015 2024-08-10 13:51:09,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=15.0 2024-08-10 13:51:13,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=583020.0, ans=0.125 2024-08-10 13:51:16,002 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 13:51:20,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=583120.0, ans=0.2 2024-08-10 13:51:30,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 350, loss[loss=0.105, beats_loss=0.01184, ecapa_loss=0.0002541, whisper_loss=0.09058, over 21728.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0119, ecapa_loss=0.0002477, whisper_loss=0.09317, over 3145214.29 frames. ], batch size: 90, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:51:34,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=583220.0, ans=0.0 2024-08-10 13:51:34,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=583220.0, ans=0.0 2024-08-10 13:51:38,364 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-10 13:51:42,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2024-08-10 13:51:46,301 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-10 13:51:54,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583320.0, ans=0.1 2024-08-10 13:51:59,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=583420.0, ans=0.125 2024-08-10 13:52:02,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=583420.0, ans=0.0 2024-08-10 13:52:20,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-10 13:52:43,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 400, loss[loss=0.09219, beats_loss=0.01213, ecapa_loss=0.0002078, whisper_loss=0.07798, over 14068.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0119, ecapa_loss=0.000246, whisper_loss=0.09307, over 3285796.12 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:52:50,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=583720.0, ans=0.015 2024-08-10 13:52:56,731 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 13:53:06,277 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 13:53:08,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-10 13:53:17,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=583920.0, ans=0.125 2024-08-10 13:53:17,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=583920.0, ans=0.1 2024-08-10 13:53:21,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=583920.0, ans=0.1 2024-08-10 13:53:25,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.918e+01 3.260e+01 3.754e+01 7.890e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-10 13:53:36,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=584020.0, ans=0.0 2024-08-10 13:53:39,143 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 13:53:40,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=584020.0, ans=0.2 2024-08-10 13:53:46,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584120.0, ans=0.1 2024-08-10 13:54:00,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 450, loss[loss=0.1072, beats_loss=0.01355, ecapa_loss=0.0002258, whisper_loss=0.09139, over 22957.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01194, ecapa_loss=0.000244, whisper_loss=0.0925, over 3406595.64 frames. ], batch size: 92, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:54:01,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584220.0, ans=0.1 2024-08-10 13:54:05,202 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 13:54:17,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-10 13:54:27,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=584320.0, ans=0.125 2024-08-10 13:54:38,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=584420.0, ans=0.2 2024-08-10 13:54:38,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=584420.0, ans=0.0 2024-08-10 13:54:46,289 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 13:54:50,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2024-08-10 13:54:53,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-10 13:54:55,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=584520.0, ans=0.0 2024-08-10 13:54:55,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584520.0, ans=0.1 2024-08-10 13:54:56,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-10 13:55:00,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=584620.0, ans=0.125 2024-08-10 13:55:12,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 500, loss[loss=0.08824, beats_loss=0.01218, ecapa_loss=0.0002481, whisper_loss=0.07358, over 14795.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01189, ecapa_loss=0.0002425, whisper_loss=0.09265, over 3490104.00 frames. ], batch size: 60, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:55:16,402 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:55:21,774 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 13:55:30,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=584820.0, ans=0.035 2024-08-10 13:55:34,379 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 13:55:52,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.793e+01 3.161e+01 3.607e+01 7.948e+01, threshold=6.322e+01, percent-clipped=1.0 2024-08-10 13:55:58,194 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 13:56:06,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=585020.0, ans=0.2 2024-08-10 13:56:08,013 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.763e-02 2024-08-10 13:56:08,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=585020.0, ans=0.125 2024-08-10 13:56:24,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 550, loss[loss=0.1223, beats_loss=0.009811, ecapa_loss=0.0002458, whisper_loss=0.11, over 23333.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01183, ecapa_loss=0.0002396, whisper_loss=0.09374, over 3575283.31 frames. ], batch size: 91, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:56:24,135 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 13:56:30,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585220.0, ans=0.1 2024-08-10 13:56:32,559 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 13:56:38,456 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 13:57:02,496 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 13:57:03,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=585420.0, ans=0.0 2024-08-10 13:57:09,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=585520.0, ans=0.05 2024-08-10 13:57:20,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=585620.0, ans=0.2 2024-08-10 13:57:25,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=585620.0, ans=0.125 2024-08-10 13:57:36,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 600, loss[loss=0.1107, beats_loss=0.01217, ecapa_loss=0.0002645, whisper_loss=0.09592, over 20984.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.0118, ecapa_loss=0.0002396, whisper_loss=0.09398, over 3638808.84 frames. ], batch size: 85, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:57:59,694 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 13:58:00,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=585820.0, ans=0.1 2024-08-10 13:58:15,479 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 13:58:16,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.818e+01 3.113e+01 3.779e+01 5.763e+01, threshold=6.225e+01, percent-clipped=0.0 2024-08-10 13:58:21,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=586020.0, ans=0.125 2024-08-10 13:58:34,390 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 13:58:48,508 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 650, loss[loss=0.1194, beats_loss=0.0118, ecapa_loss=0.000247, whisper_loss=0.1051, over 21529.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01187, ecapa_loss=0.0002373, whisper_loss=0.09465, over 3709553.47 frames. ], batch size: 88, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:58:50,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=586220.0, ans=0.0 2024-08-10 13:59:03,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=586320.0, ans=0.125 2024-08-10 13:59:14,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=586320.0, ans=0.125 2024-08-10 13:59:24,055 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 13:59:39,141 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 13:59:42,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=586520.0, ans=0.0 2024-08-10 13:59:58,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 700, loss[loss=0.09197, beats_loss=0.01429, ecapa_loss=0.0001772, whisper_loss=0.07591, over 18313.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0118, ecapa_loss=0.0002389, whisper_loss=0.09489, over 3735686.85 frames. ], batch size: 72, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:00:02,670 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:00:08,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=586720.0, ans=0.2 2024-08-10 14:00:08,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2024-08-10 14:00:38,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.878e+01 3.235e+01 3.847e+01 7.521e+01, threshold=6.470e+01, percent-clipped=2.0 2024-08-10 14:00:47,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-08-10 14:01:07,432 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 14:01:07,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=587120.0, ans=0.0 2024-08-10 14:01:11,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 750, loss[loss=0.1265, beats_loss=0.008119, ecapa_loss=0.0002454, whisper_loss=0.1159, over 16128.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01181, ecapa_loss=0.0002372, whisper_loss=0.09474, over 3768173.15 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:01:15,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-08-10 14:01:53,939 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 14:02:00,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=12.0 2024-08-10 14:02:21,750 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 800, loss[loss=0.1153, beats_loss=0.009789, ecapa_loss=0.0003022, whisper_loss=0.1025, over 19421.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01182, ecapa_loss=0.0002392, whisper_loss=0.09442, over 3798792.48 frames. ], batch size: 79, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:02:24,779 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-10 14:02:52,939 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-10 14:02:55,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=15.0 2024-08-10 14:02:57,492 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:03:01,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.857e+01 3.282e+01 4.072e+01 6.223e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 14:03:02,906 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 14:03:33,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 850, loss[loss=0.1132, beats_loss=0.01001, ecapa_loss=0.0002516, whisper_loss=0.1006, over 17029.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01171, ecapa_loss=0.0002384, whisper_loss=0.09472, over 3814191.67 frames. ], batch size: 69, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:03:55,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=588320.0, ans=0.035 2024-08-10 14:04:16,686 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.736e+05 2024-08-10 14:04:27,867 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 14:04:50,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 900, loss[loss=0.09876, beats_loss=0.01349, ecapa_loss=0.000225, whisper_loss=0.08302, over 18662.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01169, ecapa_loss=0.0002376, whisper_loss=0.09452, over 3803122.40 frames. ], batch size: 77, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:04:52,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=588720.0, ans=0.125 2024-08-10 14:04:54,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.04 vs. limit=10.0 2024-08-10 14:05:00,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=588720.0, ans=0.0 2024-08-10 14:05:10,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=588820.0, ans=0.2 2024-08-10 14:05:10,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-10 14:05:17,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=588820.0, ans=0.0 2024-08-10 14:05:17,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=12.0 2024-08-10 14:05:27,077 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 17 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 14:05:32,076 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.732e+01 3.108e+01 3.625e+01 6.653e+01, threshold=6.216e+01, percent-clipped=1.0 2024-08-10 14:05:34,497 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 14:06:05,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 950, loss[loss=0.06371, beats_loss=0.01276, ecapa_loss=0.0001778, whisper_loss=0.04917, over 14331.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01173, ecapa_loss=0.0002359, whisper_loss=0.09383, over 3792924.05 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:06:08,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=589220.0, ans=0.125 2024-08-10 14:06:09,290 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-10 14:06:30,575 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 14:06:41,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=589420.0, ans=0.95 2024-08-10 14:06:53,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=589520.0, ans=0.125 2024-08-10 14:06:55,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=589520.0, ans=0.125 2024-08-10 14:06:58,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=589520.0, ans=0.125 2024-08-10 14:06:58,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-08-10 14:07:18,883 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 14:07:21,588 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1000, loss[loss=0.1179, beats_loss=0.01054, ecapa_loss=0.0002053, whisper_loss=0.1053, over 18201.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01179, ecapa_loss=0.0002346, whisper_loss=0.0939, over 3808646.13 frames. ], batch size: 67, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:07:36,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=589820.0, ans=0.09899494936611666 2024-08-10 14:07:38,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=589820.0, ans=0.125 2024-08-10 14:07:39,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2024-08-10 14:07:44,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-08-10 14:07:51,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-10 14:07:54,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=589920.0, ans=0.0 2024-08-10 14:07:59,220 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 14:08:02,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=589920.0, ans=10.0 2024-08-10 14:08:04,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.773e+01 3.202e+01 3.484e+01 8.284e+01, threshold=6.403e+01, percent-clipped=2.0 2024-08-10 14:08:37,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1050, loss[loss=0.09512, beats_loss=0.01034, ecapa_loss=0.000203, whisper_loss=0.08274, over 16708.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01185, ecapa_loss=0.0002339, whisper_loss=0.09347, over 3820838.83 frames. ], batch size: 60, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:08:38,474 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 14:08:49,058 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 14:09:22,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590420.0, ans=0.1 2024-08-10 14:09:26,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590520.0, ans=0.1 2024-08-10 14:09:38,249 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:09:51,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=590620.0, ans=0.1 2024-08-10 14:09:55,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1100, loss[loss=0.113, beats_loss=0.01319, ecapa_loss=0.0002158, whisper_loss=0.09768, over 15177.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0119, ecapa_loss=0.0002334, whisper_loss=0.09383, over 3836436.32 frames. ], batch size: 60, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:10:03,393 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 14:10:06,348 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 14:10:17,598 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 14:10:19,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=590820.0, ans=0.0 2024-08-10 14:10:39,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=590920.0, ans=0.125 2024-08-10 14:10:42,515 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.893e+01 3.257e+01 3.748e+01 6.503e+01, threshold=6.515e+01, percent-clipped=1.0 2024-08-10 14:11:04,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=591120.0, ans=0.125 2024-08-10 14:11:12,742 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 14:11:15,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1150, loss[loss=0.09979, beats_loss=0.01738, ecapa_loss=0.0002009, whisper_loss=0.0804, over 17296.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01184, ecapa_loss=0.0002344, whisper_loss=0.0946, over 3866894.96 frames. ], batch size: 69, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:11:23,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=591220.0, ans=0.125 2024-08-10 14:11:24,979 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 14:11:36,283 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 14:11:37,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=591320.0, ans=0.0 2024-08-10 14:11:41,644 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-10 14:11:50,093 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-10 14:11:53,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=591420.0, ans=0.125 2024-08-10 14:12:01,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591520.0, ans=0.1 2024-08-10 14:12:02,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=591520.0, ans=0.125 2024-08-10 14:12:04,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=591520.0, ans=0.0 2024-08-10 14:12:05,082 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 14:12:16,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=591620.0, ans=0.125 2024-08-10 14:12:24,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=591620.0, ans=0.2 2024-08-10 14:12:29,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1200, loss[loss=0.09427, beats_loss=0.01334, ecapa_loss=0.0002393, whisper_loss=0.07854, over 15694.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01189, ecapa_loss=0.0002319, whisper_loss=0.09392, over 3834866.51 frames. ], batch size: 64, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:12:40,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=591720.0, ans=0.125 2024-08-10 14:12:42,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=591720.0, ans=0.2 2024-08-10 14:12:54,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591820.0, ans=0.1 2024-08-10 14:13:12,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.894e+01 3.360e+01 3.999e+01 6.251e+01, threshold=6.719e+01, percent-clipped=0.0 2024-08-10 14:13:43,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=592120.0, ans=0.125 2024-08-10 14:13:46,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1250, loss[loss=0.1279, beats_loss=0.01236, ecapa_loss=0.0002213, whisper_loss=0.1133, over 24225.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01181, ecapa_loss=0.0002313, whisper_loss=0.09515, over 3834187.38 frames. ], batch size: 93, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:13:50,496 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 14:13:54,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592220.0, ans=0.1 2024-08-10 14:13:57,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=592220.0, ans=0.0 2024-08-10 14:13:58,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2024-08-10 14:14:17,832 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 14:14:19,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=592420.0, ans=0.0 2024-08-10 14:14:24,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2024-08-10 14:14:31,815 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 14:14:42,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=22.5 2024-08-10 14:14:45,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=22.5 2024-08-10 14:14:50,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=592620.0, ans=0.125 2024-08-10 14:14:56,541 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 14:15:00,951 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 14:15:03,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1300, loss[loss=0.1249, beats_loss=0.01003, ecapa_loss=0.0002426, whisper_loss=0.1124, over 22368.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002328, whisper_loss=0.09495, over 3812273.89 frames. ], batch size: 90, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:15:08,407 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 14:15:17,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=592720.0, ans=0.1 2024-08-10 14:15:33,392 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 14:15:36,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=592920.0, ans=0.0 2024-08-10 14:15:46,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-08-10 14:15:50,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.731e+01 3.070e+01 3.519e+01 6.243e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 14:15:50,943 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 14:16:24,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1350, loss[loss=0.1153, beats_loss=0.01242, ecapa_loss=0.0001839, whisper_loss=0.1011, over 19547.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01171, ecapa_loss=0.0002328, whisper_loss=0.09496, over 3793661.62 frames. ], batch size: 74, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:16:35,134 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.392e+01 2024-08-10 14:17:04,801 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 14:17:10,329 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 21 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-10 14:17:15,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-08-10 14:17:33,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=593620.0, ans=0.125 2024-08-10 14:17:41,523 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 14:17:43,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593720.0, ans=0.1 2024-08-10 14:17:44,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1400, loss[loss=0.1202, beats_loss=0.01087, ecapa_loss=0.0002528, whisper_loss=0.1068, over 20547.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01165, ecapa_loss=0.000234, whisper_loss=0.09522, over 3791664.24 frames. ], batch size: 79, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:17:59,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=593820.0, ans=0.2 2024-08-10 14:17:59,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593820.0, ans=0.1 2024-08-10 14:18:05,550 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 14:18:05,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=593820.0, ans=0.125 2024-08-10 14:18:12,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=593920.0, ans=0.125 2024-08-10 14:18:25,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.747e+01 3.189e+01 3.732e+01 5.782e+01, threshold=6.377e+01, percent-clipped=0.0 2024-08-10 14:18:33,982 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 14:18:38,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=594020.0, ans=0.035 2024-08-10 14:18:56,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1450, loss[loss=0.1105, beats_loss=0.01204, ecapa_loss=0.0002165, whisper_loss=0.09628, over 17847.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01175, ecapa_loss=0.000233, whisper_loss=0.09438, over 3759921.36 frames. ], batch size: 70, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:19:36,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=594320.0, ans=0.125 2024-08-10 14:19:40,753 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 14:19:48,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=594320.0, ans=0.125 2024-08-10 14:20:13,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=594520.0, ans=0.0 2024-08-10 14:20:19,075 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 14:20:24,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.93 vs. limit=15.0 2024-08-10 14:20:40,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1500, loss[loss=0.09594, beats_loss=0.0145, ecapa_loss=0.0001509, whisper_loss=0.07994, over 16841.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01173, ecapa_loss=0.0002324, whisper_loss=0.09433, over 3786635.03 frames. ], batch size: 64, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:20:44,727 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.432e-01 2024-08-10 14:20:51,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=594720.0, ans=0.125 2024-08-10 14:21:00,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=594820.0, ans=0.0 2024-08-10 14:21:23,480 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 14:21:24,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.686e+01 3.011e+01 3.504e+01 1.040e+02, threshold=6.023e+01, percent-clipped=2.0 2024-08-10 14:21:30,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=595020.0, ans=0.0 2024-08-10 14:21:34,693 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 14:21:52,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-08-10 14:21:59,056 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1550, loss[loss=0.1083, beats_loss=0.01162, ecapa_loss=0.0002128, whisper_loss=0.09456, over 20955.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01174, ecapa_loss=0.0002336, whisper_loss=0.0943, over 3775498.01 frames. ], batch size: 82, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:22:18,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=595320.0, ans=0.5 2024-08-10 14:22:31,933 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 14:22:33,526 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 9 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 14:22:34,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2024-08-10 14:22:54,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=595520.0, ans=0.0 2024-08-10 14:22:55,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=595520.0, ans=0.2 2024-08-10 14:22:57,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=595520.0, ans=0.07 2024-08-10 14:22:57,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2024-08-10 14:22:59,606 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 14:23:14,604 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 14:23:15,843 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1600, loss[loss=0.1084, beats_loss=0.01196, ecapa_loss=0.0002416, whisper_loss=0.09406, over 22353.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01174, ecapa_loss=0.0002328, whisper_loss=0.09356, over 3775041.59 frames. ], batch size: 91, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:23:30,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=595820.0, ans=0.0 2024-08-10 14:23:38,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=595820.0, ans=0.2 2024-08-10 14:23:44,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595820.0, ans=0.1 2024-08-10 14:23:47,890 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 14:23:47,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=595920.0, ans=0.125 2024-08-10 14:23:54,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=595920.0, ans=0.07 2024-08-10 14:23:54,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=595920.0, ans=0.125 2024-08-10 14:23:59,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.802e+01 3.147e+01 3.611e+01 5.289e+01, threshold=6.294e+01, percent-clipped=0.0 2024-08-10 14:24:15,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=12.0 2024-08-10 14:24:27,608 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 14:24:32,493 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 14:24:37,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1650, loss[loss=0.1211, beats_loss=0.01102, ecapa_loss=0.0002008, whisper_loss=0.1081, over 22696.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01171, ecapa_loss=0.0002325, whisper_loss=0.09455, over 3809265.44 frames. ], batch size: 88, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:24:50,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596220.0, ans=0.1 2024-08-10 14:25:17,157 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 14:25:20,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=596420.0, ans=0.0 2024-08-10 14:25:26,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=596520.0, ans=0.0 2024-08-10 14:25:26,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2024-08-10 14:25:52,005 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 14:25:52,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=596720.0, ans=0.125 2024-08-10 14:25:52,984 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1700, loss[loss=0.1375, beats_loss=0.009462, ecapa_loss=0.0002403, whisper_loss=0.1256, over 24700.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01159, ecapa_loss=0.000233, whisper_loss=0.09528, over 3804459.94 frames. ], batch size: 89, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:25:58,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=596720.0, ans=0.2 2024-08-10 14:26:02,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=596720.0, ans=0.09899494936611666 2024-08-10 14:26:04,752 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 39 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 14:26:15,232 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 14:26:34,981 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.744e+01 3.070e+01 3.564e+01 5.631e+01, threshold=6.139e+01, percent-clipped=0.0 2024-08-10 14:26:44,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=597020.0, ans=0.125 2024-08-10 14:26:49,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-10 14:26:57,675 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 43 from LS+wenet, 9 from Vox, 39 fro AS 2024-08-10 14:27:08,787 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1750, loss[loss=0.105, beats_loss=0.01283, ecapa_loss=0.0002177, whisper_loss=0.09, over 21472.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01167, ecapa_loss=0.0002313, whisper_loss=0.09518, over 3805170.45 frames. ], batch size: 89, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:27:28,650 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 14:27:32,482 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 14:27:36,617 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 14:27:37,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-08-10 14:27:47,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=597420.0, ans=0.2 2024-08-10 14:27:53,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-08-10 14:27:54,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2024-08-10 14:27:57,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=597520.0, ans=0.05 2024-08-10 14:28:21,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=597620.0, ans=0.1 2024-08-10 14:28:22,305 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 14:28:24,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=597620.0, ans=0.07 2024-08-10 14:28:26,701 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1800, loss[loss=0.1282, beats_loss=0.009819, ecapa_loss=0.0002233, whisper_loss=0.1162, over 18556.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01164, ecapa_loss=0.0002323, whisper_loss=0.09581, over 3834339.31 frames. ], batch size: 71, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:28:36,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=597720.0, ans=0.0 2024-08-10 14:28:45,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=597820.0, ans=0.125 2024-08-10 14:28:57,147 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 14:29:04,495 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 14:29:07,581 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.774e+01 3.082e+01 3.729e+01 4.718e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-10 14:29:11,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=598020.0, ans=0.05 2024-08-10 14:29:13,971 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 14:29:24,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-10 14:29:40,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1850, loss[loss=0.09512, beats_loss=0.01436, ecapa_loss=0.0002193, whisper_loss=0.07857, over 20457.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01165, ecapa_loss=0.000233, whisper_loss=0.09556, over 3822764.36 frames. ], batch size: 84, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:29:43,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=598220.0, ans=0.07 2024-08-10 14:30:29,744 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:30:31,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=598520.0, ans=0.125 2024-08-10 14:30:45,754 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 14:30:57,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1900, loss[loss=0.1091, beats_loss=0.01166, ecapa_loss=0.0002467, whisper_loss=0.09501, over 22144.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01164, ecapa_loss=0.0002364, whisper_loss=0.09502, over 3767685.22 frames. ], batch size: 91, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:31:05,208 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 13 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 14:31:15,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=598820.0, ans=0.2 2024-08-10 14:31:21,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.15 vs. limit=22.5 2024-08-10 14:31:35,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598920.0, ans=0.1 2024-08-10 14:31:41,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.853e+01 3.252e+01 3.827e+01 6.548e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-10 14:31:49,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=599020.0, ans=0.0 2024-08-10 14:31:56,502 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 14:32:07,800 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.97 vs. limit=22.5 2024-08-10 14:32:09,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2024-08-10 14:32:14,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 1950, loss[loss=0.1072, beats_loss=0.01505, ecapa_loss=0.0002298, whisper_loss=0.08983, over 21297.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01179, ecapa_loss=0.0002395, whisper_loss=0.0937, over 3776490.04 frames. ], batch size: 84, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:32:17,324 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 15 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 14:32:34,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=599320.0, ans=0.125 2024-08-10 14:32:37,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-10 14:32:41,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=599320.0, ans=0.035 2024-08-10 14:32:53,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=599420.0, ans=0.0 2024-08-10 14:32:57,871 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 14:33:01,559 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 14:33:06,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=599520.0, ans=22.5 2024-08-10 14:33:16,794 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 14:33:20,707 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-10 14:33:30,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2000, loss[loss=0.08102, beats_loss=0.0138, ecapa_loss=0.0002079, whisper_loss=0.06514, over 17836.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01182, ecapa_loss=0.0002409, whisper_loss=0.09356, over 3785049.23 frames. ], batch size: 71, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:33:42,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=599720.0, ans=0.125 2024-08-10 14:33:43,706 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 14:33:50,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=599820.0, ans=0.125 2024-08-10 14:33:51,833 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 14:33:53,004 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 14:34:16,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.787e+01 3.156e+01 3.560e+01 5.120e+01, threshold=6.313e+01, percent-clipped=0.0 2024-08-10 14:34:30,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2024-08-10 14:34:34,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=600120.0, ans=22.5 2024-08-10 14:34:38,699 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 14:34:39,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=600120.0, ans=0.0 2024-08-10 14:34:43,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=600120.0, ans=0.0 2024-08-10 14:34:50,052 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2050, loss[loss=0.1081, beats_loss=0.01367, ecapa_loss=0.0002331, whisper_loss=0.09208, over 20372.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01182, ecapa_loss=0.0002409, whisper_loss=0.0944, over 3807577.80 frames. ], batch size: 80, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:35:10,463 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 14:35:31,649 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 14:35:35,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600520.0, ans=0.1 2024-08-10 14:35:35,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=600520.0, ans=0.125 2024-08-10 14:35:40,418 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 14:35:43,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=600520.0, ans=0.2 2024-08-10 14:35:54,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=600620.0, ans=10.0 2024-08-10 14:36:04,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2100, loss[loss=0.1391, beats_loss=0.008926, ecapa_loss=0.0002562, whisper_loss=0.1276, over 17971.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01178, ecapa_loss=0.0002432, whisper_loss=0.09468, over 3799833.15 frames. ], batch size: 67, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:36:05,688 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 14:36:07,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=600720.0, ans=0.0 2024-08-10 14:36:13,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-10 14:36:14,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=600720.0, ans=0.0 2024-08-10 14:36:32,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=600820.0, ans=0.2 2024-08-10 14:36:46,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.750e+01 3.110e+01 3.646e+01 5.998e+01, threshold=6.220e+01, percent-clipped=0.0 2024-08-10 14:36:58,783 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 14:37:03,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=601120.0, ans=0.125 2024-08-10 14:37:05,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=601120.0, ans=0.0 2024-08-10 14:37:05,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-10 14:37:09,325 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 14:37:12,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-08-10 14:37:17,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=601120.0, ans=0.125 2024-08-10 14:37:19,336 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2150, loss[loss=0.08096, beats_loss=0.01378, ecapa_loss=0.0002884, whisper_loss=0.06429, over 15256.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01181, ecapa_loss=0.0002425, whisper_loss=0.09466, over 3809819.28 frames. ], batch size: 66, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:37:52,786 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 14:38:02,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=601420.0, ans=0.0 2024-08-10 14:38:02,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=601420.0, ans=0.0 2024-08-10 14:38:07,859 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 14:38:10,654 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 14:38:11,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=601520.0, ans=0.0 2024-08-10 14:38:33,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2024-08-10 14:38:35,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2200, loss[loss=0.106, beats_loss=0.009799, ecapa_loss=0.0002908, whisper_loss=0.09332, over 18980.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01185, ecapa_loss=0.0002431, whisper_loss=0.09467, over 3798981.16 frames. ], batch size: 77, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:38:45,964 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 14:38:58,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=601820.0, ans=0.125 2024-08-10 14:39:14,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.850e+01 3.154e+01 3.768e+01 5.598e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 14:39:35,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=602120.0, ans=0.0 2024-08-10 14:39:41,955 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 14:39:42,994 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2250, loss[loss=0.1076, beats_loss=0.01256, ecapa_loss=0.0002485, whisper_loss=0.09252, over 22766.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.0118, ecapa_loss=0.0002449, whisper_loss=0.09551, over 3815512.21 frames. ], batch size: 93, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:39:53,458 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-10 14:39:53,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=602220.0, ans=0.125 2024-08-10 14:40:00,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=602320.0, ans=0.125 2024-08-10 14:40:08,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.27 vs. limit=10.0 2024-08-10 14:40:08,784 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 38 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 14:40:14,939 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 14:40:18,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=602420.0, ans=10.0 2024-08-10 14:40:30,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.42 vs. limit=22.5 2024-08-10 14:40:33,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=602620.0, ans=0.2 2024-08-10 14:40:47,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2300, loss[loss=0.1064, beats_loss=0.01155, ecapa_loss=0.000199, whisper_loss=0.09281, over 14363.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01175, ecapa_loss=0.0002458, whisper_loss=0.0961, over 3848700.66 frames. ], batch size: 54, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:40:52,238 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 14:40:55,449 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.946e-02 2024-08-10 14:40:55,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=602720.0, ans=0.2 2024-08-10 14:41:06,471 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 14:41:12,896 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-10 14:41:19,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=602920.0, ans=15.0 2024-08-10 14:41:23,192 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.867e+01 3.175e+01 3.741e+01 6.464e+01, threshold=6.350e+01, percent-clipped=1.0 2024-08-10 14:41:23,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=602920.0, ans=0.07 2024-08-10 14:41:42,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-08-10 14:41:49,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=603120.0, ans=0.2 2024-08-10 14:41:51,349 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2350, loss[loss=0.09609, beats_loss=0.01397, ecapa_loss=0.0002553, whisper_loss=0.07957, over 17634.00 frames. ], tot_loss[loss=0.11, beats_loss=0.0119, ecapa_loss=0.0002451, whisper_loss=0.09565, over 3848775.29 frames. ], batch size: 74, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:41:55,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-08-10 14:41:56,446 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 14:42:29,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2024-08-10 14:42:44,470 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 14:42:47,084 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:42:51,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=603620.0, ans=0.0 2024-08-10 14:42:55,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2400, loss[loss=0.1072, beats_loss=0.01264, ecapa_loss=0.0001712, whisper_loss=0.0929, over 23274.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01179, ecapa_loss=0.0002439, whisper_loss=0.09617, over 3854744.40 frames. ], batch size: 89, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:42:56,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=12.0 2024-08-10 14:42:59,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=603720.0, ans=0.07 2024-08-10 14:43:02,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-08-10 14:43:03,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=603720.0, ans=0.0 2024-08-10 14:43:05,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=603720.0, ans=0.125 2024-08-10 14:43:08,192 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 14:43:16,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=603820.0, ans=0.07 2024-08-10 14:43:20,701 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:43:24,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=603920.0, ans=0.125 2024-08-10 14:43:30,720 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 14:43:31,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.726e+01 3.127e+01 3.676e+01 5.177e+01, threshold=6.255e+01, percent-clipped=0.0 2024-08-10 14:43:33,176 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 14:43:34,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=12.0 2024-08-10 14:43:34,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=604020.0, ans=0.125 2024-08-10 14:43:36,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=604020.0, ans=0.125 2024-08-10 14:43:40,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=604020.0, ans=0.125 2024-08-10 14:43:57,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=604120.0, ans=0.0 2024-08-10 14:44:00,611 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2450, loss[loss=0.1113, beats_loss=0.009949, ecapa_loss=0.0002477, whisper_loss=0.09891, over 18378.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01177, ecapa_loss=0.0002434, whisper_loss=0.09618, over 3841614.75 frames. ], batch size: 72, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:44:02,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-10 14:44:07,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=604220.0, ans=0.0 2024-08-10 14:44:18,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-08-10 14:44:20,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.03 vs. limit=15.0 2024-08-10 14:44:21,452 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 14:44:29,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-10 14:44:30,669 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 14:44:34,471 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 14:44:36,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=604420.0, ans=0.125 2024-08-10 14:44:40,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-10 14:45:03,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=15.0 2024-08-10 14:45:05,750 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2500, loss[loss=0.1058, beats_loss=0.01066, ecapa_loss=0.0002884, whisper_loss=0.09221, over 21430.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.0118, ecapa_loss=0.0002412, whisper_loss=0.09639, over 3848183.07 frames. ], batch size: 90, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:45:07,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=604720.0, ans=0.0 2024-08-10 14:45:42,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.835e+01 3.123e+01 3.643e+01 5.985e+01, threshold=6.245e+01, percent-clipped=0.0 2024-08-10 14:45:59,353 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 14:46:11,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2550, loss[loss=0.1489, beats_loss=0.011, ecapa_loss=0.0002124, whisper_loss=0.1358, over 15615.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01174, ecapa_loss=0.0002419, whisper_loss=0.09682, over 3863842.26 frames. ], batch size: 61, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:46:20,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605220.0, ans=0.1 2024-08-10 14:46:20,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=12.0 2024-08-10 14:46:29,012 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 14:46:38,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=605420.0, ans=0.95 2024-08-10 14:46:48,061 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 14:46:53,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=605520.0, ans=0.125 2024-08-10 14:46:59,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-10 14:47:05,389 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 14:47:13,018 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 14:47:15,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2600, loss[loss=0.1125, beats_loss=0.009498, ecapa_loss=0.0002576, whisper_loss=0.1004, over 16755.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01163, ecapa_loss=0.0002436, whisper_loss=0.09714, over 3867195.88 frames. ], batch size: 68, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:47:16,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=22.5 2024-08-10 14:47:40,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605920.0, ans=0.1 2024-08-10 14:47:51,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.738e+01 3.065e+01 3.602e+01 6.052e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 14:47:55,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=606020.0, ans=0.125 2024-08-10 14:47:56,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=606020.0, ans=0.0 2024-08-10 14:48:00,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=606020.0, ans=12.0 2024-08-10 14:48:20,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2650, loss[loss=0.111, beats_loss=0.0122, ecapa_loss=0.0002267, whisper_loss=0.09655, over 21735.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01174, ecapa_loss=0.0002451, whisper_loss=0.09617, over 3893718.86 frames. ], batch size: 84, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:48:21,905 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 14:48:22,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=606220.0, ans=0.09899494936611666 2024-08-10 14:48:25,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=22.5 2024-08-10 14:48:25,960 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 16 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-10 14:48:39,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=606320.0, ans=0.125 2024-08-10 14:49:11,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=606620.0, ans=0.2 2024-08-10 14:49:15,464 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 14:49:19,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=606620.0, ans=0.125 2024-08-10 14:49:25,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2700, loss[loss=0.1105, beats_loss=0.01219, ecapa_loss=0.0002728, whisper_loss=0.09557, over 22876.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01177, ecapa_loss=0.0002445, whisper_loss=0.09604, over 3901795.47 frames. ], batch size: 93, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:49:27,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=606720.0, ans=0.07 2024-08-10 14:49:27,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=606720.0, ans=0.0 2024-08-10 14:50:02,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 3.024e+01 3.379e+01 4.188e+01 8.555e+01, threshold=6.757e+01, percent-clipped=2.0 2024-08-10 14:50:26,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.84 vs. limit=10.0 2024-08-10 14:50:30,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2750, loss[loss=0.1127, beats_loss=0.01121, ecapa_loss=0.0002433, whisper_loss=0.09907, over 18901.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01189, ecapa_loss=0.0002423, whisper_loss=0.09484, over 3885751.83 frames. ], batch size: 75, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:50:32,631 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 14:50:39,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=607220.0, ans=0.2 2024-08-10 14:50:42,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=607220.0, ans=0.125 2024-08-10 14:50:52,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=607320.0, ans=0.5 2024-08-10 14:51:03,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-10 14:51:11,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=607520.0, ans=0.0 2024-08-10 14:51:14,893 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 14:51:18,809 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 14:51:30,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=607620.0, ans=0.125 2024-08-10 14:51:33,271 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 14:51:36,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2800, loss[loss=0.1127, beats_loss=0.01378, ecapa_loss=0.0002184, whisper_loss=0.09671, over 17048.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01183, ecapa_loss=0.0002433, whisper_loss=0.09519, over 3877011.87 frames. ], batch size: 69, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:51:37,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-10 14:51:40,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607720.0, ans=0.1 2024-08-10 14:52:02,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=607920.0, ans=0.95 2024-08-10 14:52:12,874 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 14:52:14,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 2.765e+01 3.202e+01 3.631e+01 5.642e+01, threshold=6.403e+01, percent-clipped=0.0 2024-08-10 14:52:31,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=608120.0, ans=0.125 2024-08-10 14:52:35,155 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 14:52:39,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=608120.0, ans=0.2 2024-08-10 14:52:42,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2850, loss[loss=0.1154, beats_loss=0.01406, ecapa_loss=0.0002321, whisper_loss=0.09905, over 20309.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0118, ecapa_loss=0.0002438, whisper_loss=0.09568, over 3869908.31 frames. ], batch size: 82, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:52:52,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2024-08-10 14:53:00,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=608320.0, ans=0.05 2024-08-10 14:53:01,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-08-10 14:53:06,694 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 14:53:10,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=608420.0, ans=0.0 2024-08-10 14:53:27,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=608520.0, ans=0.125 2024-08-10 14:53:32,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=608520.0, ans=0.0 2024-08-10 14:53:47,895 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2900, loss[loss=0.09614, beats_loss=0.01194, ecapa_loss=0.0002476, whisper_loss=0.08173, over 18108.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01186, ecapa_loss=0.0002447, whisper_loss=0.09538, over 3860286.01 frames. ], batch size: 73, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:53:59,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=22.5 2024-08-10 14:54:13,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=608920.0, ans=0.125 2024-08-10 14:54:15,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=608920.0, ans=0.125 2024-08-10 14:54:17,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=608920.0, ans=0.025 2024-08-10 14:54:21,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=608920.0, ans=0.0 2024-08-10 14:54:24,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.824e+01 3.286e+01 3.731e+01 5.146e+01, threshold=6.573e+01, percent-clipped=0.0 2024-08-10 14:54:33,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=609020.0, ans=0.0 2024-08-10 14:54:37,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=609020.0, ans=0.07 2024-08-10 14:54:49,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=609120.0, ans=0.125 2024-08-10 14:54:53,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 2950, loss[loss=0.1116, beats_loss=0.01241, ecapa_loss=0.0002464, whisper_loss=0.09677, over 19453.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01189, ecapa_loss=0.0002443, whisper_loss=0.09542, over 3876359.13 frames. ], batch size: 79, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:54:54,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=609220.0, ans=0.125 2024-08-10 14:54:58,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=609220.0, ans=0.125 2024-08-10 14:55:08,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=609320.0, ans=0.125 2024-08-10 14:55:08,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=609320.0, ans=0.2 2024-08-10 14:55:09,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=609320.0, ans=0.2 2024-08-10 14:55:25,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=609420.0, ans=0.1 2024-08-10 14:55:38,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-08-10 14:55:43,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=609520.0, ans=0.125 2024-08-10 14:55:45,751 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 14:55:58,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3000, loss[loss=0.1083, beats_loss=0.01192, ecapa_loss=0.0002499, whisper_loss=0.09392, over 22795.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01191, ecapa_loss=0.0002462, whisper_loss=0.09537, over 3879833.10 frames. ], batch size: 94, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:55:58,282 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 14:56:35,292 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on ASR_libri: loss=0.2643, beats_loss=0, ecapa_loss=0.0007548, whisper_loss=0.2568, over 922467.00 frames. 2024-08-10 14:56:42,758 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9025, 1.5982, 2.3413, 0.9061, 0.8026, 1.7570, 2.2697, 2.1626], device='cuda:3') 2024-08-10 14:56:52,804 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on SV_voxceleb1: loss=0.006405, beats_loss=0, ecapa_loss=0.0006405, whisper_loss=0, over 939242.00 frames. 2024-08-10 14:57:06,006 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0318, 0.0398, 0.0117, 3.5193, 0.0211, 0.0687, 0.0506, 0.0542], device='cuda:3') 2024-08-10 14:58:27,381 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.7226, 1.3047, 1.0727, 0.4681, 0.7900, 0.8554, 0.9789, 1.0519], device='cuda:3') 2024-08-10 14:58:43,692 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on AT_audioset: loss=0.02683, beats_loss=0.02683, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 14:58:43,695 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 14:59:00,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-08-10 14:59:02,931 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-10 14:59:08,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=609920.0, ans=0.0 2024-08-10 14:59:19,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.032e+01 3.491e+01 3.911e+01 5.761e+01, threshold=6.982e+01, percent-clipped=0.0 2024-08-10 14:59:21,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=610020.0, ans=0.125 2024-08-10 14:59:27,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=610020.0, ans=0.0 2024-08-10 14:59:48,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3050, loss[loss=0.126, beats_loss=0.01045, ecapa_loss=0.0002247, whisper_loss=0.1133, over 23954.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01186, ecapa_loss=0.0002472, whisper_loss=0.09618, over 3905676.77 frames. ], batch size: 91, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:59:51,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610220.0, ans=0.1 2024-08-10 15:00:02,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=610320.0, ans=0.125 2024-08-10 15:00:08,331 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 15:00:09,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=12.0 2024-08-10 15:00:19,780 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 13 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 15:00:27,774 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 15:00:29,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-10 15:00:42,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=610620.0, ans=0.125 2024-08-10 15:00:54,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3100, loss[loss=0.1352, beats_loss=0.009618, ecapa_loss=0.0002867, whisper_loss=0.1227, over 18749.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01192, ecapa_loss=0.0002485, whisper_loss=0.09556, over 3874177.54 frames. ], batch size: 75, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:01:05,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=610720.0, ans=0.125 2024-08-10 15:01:15,567 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 15:01:15,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=610820.0, ans=0.2 2024-08-10 15:01:32,068 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 15:01:33,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.743e+01 3.024e+01 3.559e+01 5.609e+01, threshold=6.048e+01, percent-clipped=0.0 2024-08-10 15:01:39,873 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 15:02:03,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3150, loss[loss=0.1032, beats_loss=0.00945, ecapa_loss=0.0002584, whisper_loss=0.09114, over 19169.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01183, ecapa_loss=0.0002488, whisper_loss=0.0959, over 3875669.40 frames. ], batch size: 74, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:02:11,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=611220.0, ans=0.125 2024-08-10 15:02:28,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-10 15:02:41,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=611420.0, ans=0.05 2024-08-10 15:02:43,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=611420.0, ans=0.0 2024-08-10 15:02:51,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=611520.0, ans=0.125 2024-08-10 15:02:54,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=611520.0, ans=0.0 2024-08-10 15:03:11,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2024-08-10 15:03:15,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3200, loss[loss=0.09367, beats_loss=0.01318, ecapa_loss=0.0002434, whisper_loss=0.07806, over 20386.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01189, ecapa_loss=0.0002477, whisper_loss=0.09573, over 3868593.29 frames. ], batch size: 85, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:03:16,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-10 15:03:21,373 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 15:03:34,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=611820.0, ans=0.0 2024-08-10 15:03:35,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=611820.0, ans=0.125 2024-08-10 15:03:38,280 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 15:03:40,686 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-10 15:03:47,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=611920.0, ans=0.125 2024-08-10 15:03:56,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.779e+01 3.150e+01 3.545e+01 6.901e+01, threshold=6.301e+01, percent-clipped=2.0 2024-08-10 15:04:01,692 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 15:04:07,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=612020.0, ans=0.125 2024-08-10 15:04:08,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=612020.0, ans=0.0 2024-08-10 15:04:17,452 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 15:04:19,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=612120.0, ans=0.0 2024-08-10 15:04:25,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-10 15:04:28,368 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3250, loss[loss=0.1066, beats_loss=0.01151, ecapa_loss=0.0002551, whisper_loss=0.09259, over 16640.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01176, ecapa_loss=0.0002502, whisper_loss=0.09628, over 3850277.06 frames. ], batch size: 64, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:04:31,682 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 15:04:47,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=612320.0, ans=0.0 2024-08-10 15:05:06,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=612420.0, ans=0.07 2024-08-10 15:05:29,298 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 15:05:40,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3300, loss[loss=0.08073, beats_loss=0.01381, ecapa_loss=0.0002035, whisper_loss=0.06489, over 23095.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01185, ecapa_loss=0.0002493, whisper_loss=0.09604, over 3828918.63 frames. ], batch size: 92, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:05:56,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=22.5 2024-08-10 15:06:06,128 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 15 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 15:06:22,297 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.755e+01 3.072e+01 3.647e+01 1.345e+02, threshold=6.143e+01, percent-clipped=1.0 2024-08-10 15:06:25,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=613020.0, ans=0.125 2024-08-10 15:06:28,356 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 15:06:33,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-08-10 15:06:48,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=613120.0, ans=0.2 2024-08-10 15:06:54,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3350, loss[loss=0.1135, beats_loss=0.009003, ecapa_loss=0.0002625, whisper_loss=0.1018, over 13988.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01183, ecapa_loss=0.0002491, whisper_loss=0.09508, over 3847765.63 frames. ], batch size: 56, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:07:04,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=613220.0, ans=0.0 2024-08-10 15:07:05,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=613220.0, ans=0.2 2024-08-10 15:07:08,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=613320.0, ans=0.125 2024-08-10 15:07:11,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.22 vs. limit=15.0 2024-08-10 15:07:24,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=613420.0, ans=0.125 2024-08-10 15:07:29,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613420.0, ans=0.1 2024-08-10 15:07:41,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=613520.0, ans=0.125 2024-08-10 15:07:52,261 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 10 from Vox, 51 fro AS 2024-08-10 15:07:56,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=613620.0, ans=0.125 2024-08-10 15:07:58,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.53 vs. limit=22.5 2024-08-10 15:08:01,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2024-08-10 15:08:08,154 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3400, loss[loss=0.1019, beats_loss=0.01103, ecapa_loss=0.0002423, whisper_loss=0.08844, over 18105.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01176, ecapa_loss=0.000248, whisper_loss=0.0953, over 3864419.57 frames. ], batch size: 72, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:08:09,693 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 15:08:23,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=613820.0, ans=0.0 2024-08-10 15:08:29,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=613820.0, ans=0.07 2024-08-10 15:08:44,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=613920.0, ans=0.125 2024-08-10 15:08:49,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.884e+01 3.210e+01 3.796e+01 7.234e+01, threshold=6.419e+01, percent-clipped=1.0 2024-08-10 15:09:03,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=614020.0, ans=0.125 2024-08-10 15:09:07,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=614120.0, ans=0.125 2024-08-10 15:09:07,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=614120.0, ans=0.2 2024-08-10 15:09:09,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=614120.0, ans=0.125 2024-08-10 15:09:13,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2024-08-10 15:09:13,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-08-10 15:09:15,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=614120.0, ans=0.125 2024-08-10 15:09:21,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3450, loss[loss=0.09449, beats_loss=0.01343, ecapa_loss=0.0002717, whisper_loss=0.07834, over 21906.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01183, ecapa_loss=0.0002474, whisper_loss=0.09467, over 3859020.32 frames. ], batch size: 92, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:09:44,726 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 15:10:04,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-08-10 15:10:07,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2024-08-10 15:10:14,485 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.581e-01 2024-08-10 15:10:18,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=614620.0, ans=0.125 2024-08-10 15:10:18,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-10 15:10:34,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3500, loss[loss=0.1037, beats_loss=0.01164, ecapa_loss=0.0002274, whisper_loss=0.08976, over 23274.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01189, ecapa_loss=0.0002472, whisper_loss=0.09467, over 3875902.74 frames. ], batch size: 94, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:10:39,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614720.0, ans=0.1 2024-08-10 15:10:42,207 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 15:10:45,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614720.0, ans=0.1 2024-08-10 15:11:15,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+01 2.754e+01 3.128e+01 3.525e+01 7.630e+01, threshold=6.256e+01, percent-clipped=1.0 2024-08-10 15:11:17,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615020.0, ans=0.125 2024-08-10 15:11:31,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=615120.0, ans=0.0 2024-08-10 15:11:35,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-08-10 15:11:46,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3550, loss[loss=0.1095, beats_loss=0.01328, ecapa_loss=0.0002095, whisper_loss=0.09417, over 20531.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01187, ecapa_loss=0.0002468, whisper_loss=0.09466, over 3868732.18 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:12:09,713 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 15:12:27,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615420.0, ans=0.1 2024-08-10 15:12:48,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-10 15:12:55,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=615620.0, ans=0.125 2024-08-10 15:12:58,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3600, loss[loss=0.1137, beats_loss=0.01209, ecapa_loss=0.0002351, whisper_loss=0.09925, over 22240.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01188, ecapa_loss=0.0002454, whisper_loss=0.09448, over 3837856.55 frames. ], batch size: 90, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:13:03,652 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 15:13:05,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=615720.0, ans=0.125 2024-08-10 15:13:09,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.62 vs. limit=22.5 2024-08-10 15:13:10,946 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 15:13:36,031 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 15:13:39,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.884e+01 3.216e+01 3.547e+01 5.586e+01, threshold=6.432e+01, percent-clipped=0.0 2024-08-10 15:14:11,299 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3650, loss[loss=0.09613, beats_loss=0.01421, ecapa_loss=0.0002551, whisper_loss=0.07937, over 16449.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01196, ecapa_loss=0.0002432, whisper_loss=0.09344, over 3787019.87 frames. ], batch size: 69, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:14:29,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=616320.0, ans=0.125 2024-08-10 15:14:30,364 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 15:14:45,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-10 15:14:52,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-08-10 15:14:58,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=616520.0, ans=0.07 2024-08-10 15:15:08,640 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 15:15:19,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=616620.0, ans=0.125 2024-08-10 15:15:22,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-10 15:15:23,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3700, loss[loss=0.1041, beats_loss=0.01308, ecapa_loss=0.0002193, whisper_loss=0.08882, over 20336.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01192, ecapa_loss=0.000245, whisper_loss=0.09382, over 3785092.40 frames. ], batch size: 79, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:15:26,946 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 15:16:05,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.736e+01 3.079e+01 3.558e+01 5.544e+01, threshold=6.157e+01, percent-clipped=0.0 2024-08-10 15:16:07,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-08-10 15:16:31,467 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 28 from LS+wenet, 12 from Vox, 17 fro AS 2024-08-10 15:16:33,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=617120.0, ans=0.125 2024-08-10 15:16:37,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3750, loss[loss=0.09027, beats_loss=0.01549, ecapa_loss=0.0002098, whisper_loss=0.07268, over 20158.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0119, ecapa_loss=0.0002456, whisper_loss=0.09375, over 3792584.32 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:16:42,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=617220.0, ans=0.125 2024-08-10 15:17:00,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=617320.0, ans=0.2 2024-08-10 15:17:15,769 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 15:17:18,832 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:17:22,050 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.129e-02 2024-08-10 15:17:23,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=617520.0, ans=0.0 2024-08-10 15:17:23,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=617520.0, ans=0.0 2024-08-10 15:17:29,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.49 vs. limit=12.0 2024-08-10 15:17:40,861 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 15:17:49,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3800, loss[loss=0.09366, beats_loss=0.01176, ecapa_loss=0.0002284, whisper_loss=0.07962, over 15229.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01193, ecapa_loss=0.0002439, whisper_loss=0.094, over 3811746.57 frames. ], batch size: 61, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:17:50,117 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 15:17:59,382 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 15:18:04,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=617820.0, ans=0.2 2024-08-10 15:18:11,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=617820.0, ans=0.0 2024-08-10 15:18:25,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2024-08-10 15:18:30,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.844e+01 3.115e+01 3.732e+01 5.922e+01, threshold=6.230e+01, percent-clipped=0.0 2024-08-10 15:18:40,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=618020.0, ans=0.0 2024-08-10 15:18:41,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=618020.0, ans=0.125 2024-08-10 15:18:59,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=618120.0, ans=0.125 2024-08-10 15:19:02,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618220.0, ans=0.1 2024-08-10 15:19:02,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3850, loss[loss=0.111, beats_loss=0.01273, ecapa_loss=0.0002574, whisper_loss=0.09572, over 22569.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01196, ecapa_loss=0.0002419, whisper_loss=0.09477, over 3845232.45 frames. ], batch size: 93, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:19:23,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=618320.0, ans=0.0 2024-08-10 15:19:27,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-10 15:19:46,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=618420.0, ans=0.125 2024-08-10 15:19:47,411 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 15:20:23,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=618620.0, ans=15.0 2024-08-10 15:20:31,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3900, loss[loss=0.1188, beats_loss=0.009325, ecapa_loss=0.0002877, whisper_loss=0.1066, over 19688.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01193, ecapa_loss=0.0002434, whisper_loss=0.09558, over 3867813.62 frames. ], batch size: 82, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:20:33,032 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 15:20:38,276 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.949e+05 2024-08-10 15:20:44,192 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 15:20:46,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=618720.0, ans=0.125 2024-08-10 15:20:46,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=15.0 2024-08-10 15:20:48,173 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 15:21:00,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2024-08-10 15:21:19,979 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 15:21:23,382 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 15:21:24,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+01 3.061e+01 3.504e+01 4.098e+01 1.751e+02, threshold=7.008e+01, percent-clipped=3.0 2024-08-10 15:21:38,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=619020.0, ans=0.0 2024-08-10 15:21:40,990 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 15:22:04,862 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 15:22:11,710 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 3950, loss[loss=0.1222, beats_loss=0.009961, ecapa_loss=0.0003025, whisper_loss=0.1092, over 20473.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01189, ecapa_loss=0.000245, whisper_loss=0.09584, over 3868025.15 frames. ], batch size: 82, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:22:13,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=619220.0, ans=0.0 2024-08-10 15:22:21,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-10 15:22:56,747 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 36 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 15:23:24,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=619520.0, ans=0.125 2024-08-10 15:23:54,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4000, loss[loss=0.1087, beats_loss=0.01214, ecapa_loss=0.0002399, whisper_loss=0.09419, over 22664.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01189, ecapa_loss=0.0002437, whisper_loss=0.09608, over 3860645.53 frames. ], batch size: 89, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:24:08,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=619720.0, ans=0.125 2024-08-10 15:24:47,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=619920.0, ans=0.125 2024-08-10 15:25:02,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.861e+01 3.318e+01 3.884e+01 5.554e+01, threshold=6.636e+01, percent-clipped=0.0 2024-08-10 15:25:10,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620020.0, ans=0.125 2024-08-10 15:25:30,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=620120.0, ans=0.125 2024-08-10 15:25:52,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4050, loss[loss=0.1081, beats_loss=0.01202, ecapa_loss=0.0002216, whisper_loss=0.09389, over 20422.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.0118, ecapa_loss=0.0002442, whisper_loss=0.09612, over 3868203.47 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 68719476736.0 2024-08-10 15:26:01,208 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 15:26:02,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-08-10 15:26:07,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=620220.0, ans=0.0 2024-08-10 15:26:15,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=620320.0, ans=0.025 2024-08-10 15:26:31,482 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 15:26:44,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=620420.0, ans=10.0 2024-08-10 15:26:51,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=620420.0, ans=10.0 2024-08-10 15:26:58,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=620420.0, ans=0.125 2024-08-10 15:27:36,538 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 15:27:43,822 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 15:27:50,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4100, loss[loss=0.1371, beats_loss=0.009027, ecapa_loss=0.0002793, whisper_loss=0.1252, over 14689.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01172, ecapa_loss=0.0002452, whisper_loss=0.09638, over 3863478.40 frames. ], batch size: 57, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:27:55,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-10 15:27:57,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=620720.0, ans=0.125 2024-08-10 15:28:11,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=620720.0, ans=0.125 2024-08-10 15:28:27,449 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-10 15:28:39,156 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.881e+05 2024-08-10 15:28:50,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=620920.0, ans=0.2 2024-08-10 15:28:52,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620920.0, ans=0.1 2024-08-10 15:28:59,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+01 2.982e+01 3.358e+01 3.918e+01 5.492e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-10 15:29:16,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=621020.0, ans=0.04949747468305833 2024-08-10 15:29:29,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=621120.0, ans=0.125 2024-08-10 15:29:33,270 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 15:29:34,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4150, loss[loss=0.1088, beats_loss=0.0119, ecapa_loss=0.0002147, whisper_loss=0.09475, over 20176.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01178, ecapa_loss=0.0002442, whisper_loss=0.09638, over 3879818.07 frames. ], batch size: 77, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:30:01,974 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 15:30:04,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621420.0, ans=0.1 2024-08-10 15:30:38,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=621620.0, ans=0.0 2024-08-10 15:30:49,033 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4200, loss[loss=0.1088, beats_loss=0.0106, ecapa_loss=0.0002656, whisper_loss=0.09553, over 17227.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.0119, ecapa_loss=0.0002419, whisper_loss=0.09612, over 3899306.02 frames. ], batch size: 66, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:30:54,917 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 15:31:08,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=621820.0, ans=0.0 2024-08-10 15:31:13,123 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:31:17,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-10 15:31:31,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.809e+01 3.141e+01 3.651e+01 6.704e+01, threshold=6.282e+01, percent-clipped=0.0 2024-08-10 15:31:36,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=622020.0, ans=0.125 2024-08-10 15:31:38,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-10 15:31:45,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622020.0, ans=0.1 2024-08-10 15:31:45,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.48 vs. limit=22.5 2024-08-10 15:31:56,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=12.0 2024-08-10 15:32:02,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2024-08-10 15:32:05,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4250, loss[loss=0.1096, beats_loss=0.01188, ecapa_loss=0.0002279, whisper_loss=0.09539, over 15077.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01192, ecapa_loss=0.0002413, whisper_loss=0.09542, over 3927459.38 frames. ], batch size: 57, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:32:21,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=622320.0, ans=0.0 2024-08-10 15:32:38,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=622420.0, ans=0.0 2024-08-10 15:32:39,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622420.0, ans=0.1 2024-08-10 15:32:50,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=622520.0, ans=0.125 2024-08-10 15:33:13,303 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 15:33:19,220 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4300, loss[loss=0.1182, beats_loss=0.01077, ecapa_loss=0.0002444, whisper_loss=0.105, over 23174.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01183, ecapa_loss=0.0002401, whisper_loss=0.09549, over 3897838.90 frames. ], batch size: 93, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:33:19,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622720.0, ans=0.125 2024-08-10 15:33:23,919 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 15:33:25,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=622720.0, ans=0.125 2024-08-10 15:33:40,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622820.0, ans=0.1 2024-08-10 15:33:45,363 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 15:33:54,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=622920.0, ans=0.125 2024-08-10 15:33:57,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=622920.0, ans=0.125 2024-08-10 15:33:59,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.798e+01 3.084e+01 3.774e+01 7.124e+01, threshold=6.168e+01, percent-clipped=2.0 2024-08-10 15:34:05,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623020.0, ans=0.1 2024-08-10 15:34:08,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=623020.0, ans=0.125 2024-08-10 15:34:13,096 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 15:34:14,595 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.051e-03 2024-08-10 15:34:19,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=12.0 2024-08-10 15:34:22,289 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 15:34:30,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4350, loss[loss=0.131, beats_loss=0.01044, ecapa_loss=0.0002914, whisper_loss=0.1177, over 22661.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01183, ecapa_loss=0.000242, whisper_loss=0.09512, over 3872858.92 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:34:33,798 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 15:34:44,758 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 15:34:47,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=623320.0, ans=0.125 2024-08-10 15:34:58,527 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 15:34:58,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=623420.0, ans=0.125 2024-08-10 15:35:05,514 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 15:35:05,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=623420.0, ans=0.0 2024-08-10 15:35:17,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=623520.0, ans=0.125 2024-08-10 15:35:33,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=623620.0, ans=0.125 2024-08-10 15:35:48,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=623620.0, ans=0.125 2024-08-10 15:35:50,714 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4400, loss[loss=0.09054, beats_loss=0.01418, ecapa_loss=0.0002363, whisper_loss=0.07399, over 13949.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01186, ecapa_loss=0.0002437, whisper_loss=0.09559, over 3905791.47 frames. ], batch size: 55, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:36:06,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=623720.0, ans=15.0 2024-08-10 15:36:18,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=623820.0, ans=0.125 2024-08-10 15:36:36,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-10 15:36:38,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.993e+01 3.424e+01 4.007e+01 6.509e+01, threshold=6.848e+01, percent-clipped=2.0 2024-08-10 15:36:39,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623920.0, ans=0.1 2024-08-10 15:37:01,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=624120.0, ans=0.0 2024-08-10 15:37:03,177 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 15:37:13,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=624120.0, ans=0.125 2024-08-10 15:37:14,856 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.430e-01 2024-08-10 15:37:15,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4450, loss[loss=0.1255, beats_loss=0.01283, ecapa_loss=0.0001885, whisper_loss=0.1108, over 15341.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.0118, ecapa_loss=0.0002439, whisper_loss=0.09635, over 3920101.53 frames. ], batch size: 59, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:37:29,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624220.0, ans=0.125 2024-08-10 15:37:35,769 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 15:37:40,244 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 15:37:43,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=624320.0, ans=0.125 2024-08-10 15:38:01,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624420.0, ans=0.125 2024-08-10 15:38:01,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624420.0, ans=0.125 2024-08-10 15:38:11,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=624520.0, ans=0.0 2024-08-10 15:38:22,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=624620.0, ans=0.125 2024-08-10 15:38:22,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=624620.0, ans=0.2 2024-08-10 15:38:39,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4500, loss[loss=0.1102, beats_loss=0.01073, ecapa_loss=0.0002346, whisper_loss=0.0971, over 14389.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.0118, ecapa_loss=0.0002424, whisper_loss=0.09604, over 3892480.08 frames. ], batch size: 55, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:38:43,677 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.176e+00 2024-08-10 15:38:48,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=624720.0, ans=0.0 2024-08-10 15:38:57,074 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 15:38:58,815 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-10 15:39:16,568 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 15:39:27,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.908e+01 3.221e+01 3.849e+01 6.109e+01, threshold=6.442e+01, percent-clipped=0.0 2024-08-10 15:39:37,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=625020.0, ans=0.125 2024-08-10 15:39:41,268 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 19 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 15:40:05,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4550, loss[loss=0.1033, beats_loss=0.01153, ecapa_loss=0.0002439, whisper_loss=0.0893, over 18467.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01178, ecapa_loss=0.0002445, whisper_loss=0.0955, over 3901203.37 frames. ], batch size: 73, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:40:16,168 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 11 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 15:40:51,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=625520.0, ans=0.0 2024-08-10 15:40:58,333 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 15:41:02,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=625520.0, ans=0.07 2024-08-10 15:41:04,724 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 15:41:16,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.79 vs. limit=15.0 2024-08-10 15:41:23,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4600, loss[loss=0.1062, beats_loss=0.01185, ecapa_loss=0.0002311, whisper_loss=0.09205, over 17309.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01182, ecapa_loss=0.0002436, whisper_loss=0.09485, over 3849213.92 frames. ], batch size: 67, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:41:29,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=625720.0, ans=0.125 2024-08-10 15:41:35,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=625720.0, ans=0.125 2024-08-10 15:41:47,142 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 15:41:59,661 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 15:42:04,690 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 15:42:07,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.652e+01 3.147e+01 3.453e+01 6.048e+01, threshold=6.293e+01, percent-clipped=0.0 2024-08-10 15:42:17,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2024-08-10 15:42:24,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=626020.0, ans=0.0 2024-08-10 15:42:27,046 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 15:42:42,315 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4650, loss[loss=0.08432, beats_loss=0.01261, ecapa_loss=0.0001985, whisper_loss=0.06972, over 15016.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01184, ecapa_loss=0.0002438, whisper_loss=0.09469, over 3865899.52 frames. ], batch size: 57, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:42:44,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626220.0, ans=0.1 2024-08-10 15:42:46,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-10 15:42:59,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=626320.0, ans=0.125 2024-08-10 15:43:05,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2024-08-10 15:43:08,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-08-10 15:43:20,403 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 15:43:32,075 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 15:43:51,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=12.0 2024-08-10 15:44:03,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4700, loss[loss=0.1131, beats_loss=0.01391, ecapa_loss=0.000215, whisper_loss=0.09707, over 23023.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01191, ecapa_loss=0.0002428, whisper_loss=0.09528, over 3851298.06 frames. ], batch size: 93, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:44:16,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=626720.0, ans=0.125 2024-08-10 15:44:48,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.667e+01 3.139e+01 3.783e+01 7.574e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-10 15:44:48,904 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 15:45:12,205 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 15:45:13,646 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 15:45:24,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-08-10 15:45:24,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4750, loss[loss=0.11, beats_loss=0.01203, ecapa_loss=0.000253, whisper_loss=0.09543, over 21013.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01196, ecapa_loss=0.0002415, whisper_loss=0.09507, over 3870047.42 frames. ], batch size: 84, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:45:28,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=627220.0, ans=0.125 2024-08-10 15:45:32,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=627220.0, ans=0.0 2024-08-10 15:45:33,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=627220.0, ans=0.2 2024-08-10 15:45:39,864 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 15:45:41,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627320.0, ans=0.1 2024-08-10 15:45:43,145 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 15:46:23,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=627520.0, ans=0.05 2024-08-10 15:46:37,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2024-08-10 15:46:40,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=627620.0, ans=0.1 2024-08-10 15:46:47,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4800, loss[loss=0.1186, beats_loss=0.009995, ecapa_loss=0.0003053, whisper_loss=0.1056, over 15040.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01197, ecapa_loss=0.0002433, whisper_loss=0.09474, over 3894173.65 frames. ], batch size: 64, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:46:54,014 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 15:46:56,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=627720.0, ans=0.125 2024-08-10 15:46:56,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627720.0, ans=0.1 2024-08-10 15:47:31,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=627920.0, ans=0.125 2024-08-10 15:47:35,606 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.946e+01 3.351e+01 4.117e+01 7.010e+01, threshold=6.703e+01, percent-clipped=2.0 2024-08-10 15:48:00,096 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 15:48:12,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4850, loss[loss=0.1142, beats_loss=0.01242, ecapa_loss=0.0002586, whisper_loss=0.09916, over 23275.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01205, ecapa_loss=0.0002435, whisper_loss=0.09392, over 3914465.42 frames. ], batch size: 94, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:48:24,671 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 15:48:26,402 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 15:48:26,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628220.0, ans=0.1 2024-08-10 15:48:31,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2024-08-10 15:48:37,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2024-08-10 15:48:44,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=628320.0, ans=0.125 2024-08-10 15:49:13,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.66 vs. limit=22.5 2024-08-10 15:49:35,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4900, loss[loss=0.1352, beats_loss=0.009087, ecapa_loss=0.0002847, whisper_loss=0.1233, over 15355.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01197, ecapa_loss=0.0002434, whisper_loss=0.0946, over 3892857.90 frames. ], batch size: 62, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:49:37,589 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 15:49:45,742 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 15:50:08,971 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-10 15:50:09,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628920.0, ans=0.1 2024-08-10 15:50:19,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.794e+01 3.081e+01 3.669e+01 6.406e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-10 15:50:32,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-08-10 15:50:36,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=629020.0, ans=0.125 2024-08-10 15:50:48,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=629120.0, ans=0.0 2024-08-10 15:50:48,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=629120.0, ans=8.0 2024-08-10 15:50:54,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 4950, loss[loss=0.1261, beats_loss=0.01117, ecapa_loss=0.0002057, whisper_loss=0.1128, over 21168.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01202, ecapa_loss=0.0002413, whisper_loss=0.0947, over 3878545.39 frames. ], batch size: 83, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:50:57,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=629220.0, ans=0.125 2024-08-10 15:51:00,707 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 15:51:05,086 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 15:51:14,080 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 31 from Vox, 21 fro AS 2024-08-10 15:51:19,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=629320.0, ans=0.125 2024-08-10 15:51:28,917 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 12 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 15:51:35,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=629420.0, ans=10.0 2024-08-10 15:51:44,165 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 15:51:45,877 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 15:52:15,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5000, loss[loss=0.11, beats_loss=0.01072, ecapa_loss=0.0002477, whisper_loss=0.09683, over 18225.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01193, ecapa_loss=0.0002431, whisper_loss=0.09493, over 3842939.77 frames. ], batch size: 73, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:52:21,767 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 15:52:35,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=629820.0, ans=0.0 2024-08-10 15:52:51,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-10 15:53:04,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.939e+01 3.385e+01 3.961e+01 1.332e+02, threshold=6.770e+01, percent-clipped=1.0 2024-08-10 15:53:20,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=630020.0, ans=0.0 2024-08-10 15:53:21,361 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 15:53:37,160 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5050, loss[loss=0.09226, beats_loss=0.01451, ecapa_loss=0.0002094, whisper_loss=0.07565, over 14683.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01208, ecapa_loss=0.0002439, whisper_loss=0.09452, over 3841747.72 frames. ], batch size: 60, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:53:45,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=630220.0, ans=0.0 2024-08-10 15:53:55,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=630320.0, ans=0.125 2024-08-10 15:54:02,013 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 15:54:12,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=630420.0, ans=0.0 2024-08-10 15:54:30,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=630520.0, ans=0.0 2024-08-10 15:54:30,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2024-08-10 15:54:34,957 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 15:54:59,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5100, loss[loss=0.1304, beats_loss=0.01146, ecapa_loss=0.0002858, whisper_loss=0.1161, over 21702.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0121, ecapa_loss=0.0002433, whisper_loss=0.09482, over 3848750.16 frames. ], batch size: 88, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:55:12,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=630720.0, ans=0.125 2024-08-10 15:55:32,902 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 15:55:43,802 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 15:55:44,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.958e+01 3.434e+01 3.932e+01 6.642e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 15:55:50,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=631020.0, ans=0.2 2024-08-10 15:56:01,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=631020.0, ans=0.0 2024-08-10 15:56:06,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=631120.0, ans=0.0 2024-08-10 15:56:11,397 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 15:56:14,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=631120.0, ans=0.2 2024-08-10 15:56:18,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=631120.0, ans=0.125 2024-08-10 15:56:18,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=12.0 2024-08-10 15:56:20,710 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5150, loss[loss=0.1051, beats_loss=0.01285, ecapa_loss=0.0002416, whisper_loss=0.08979, over 20425.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01205, ecapa_loss=0.0002419, whisper_loss=0.09455, over 3865790.87 frames. ], batch size: 84, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:56:37,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.20 vs. limit=10.0 2024-08-10 15:56:57,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631420.0, ans=0.1 2024-08-10 15:57:37,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5200, loss[loss=0.1336, beats_loss=0.007926, ecapa_loss=0.0002778, whisper_loss=0.1229, over 15728.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01193, ecapa_loss=0.0002424, whisper_loss=0.09537, over 3888106.53 frames. ], batch size: 60, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:57:50,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=631720.0, ans=0.125 2024-08-10 15:57:56,754 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-10 15:58:00,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-08-10 15:58:08,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631920.0, ans=0.1 2024-08-10 15:58:15,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=631920.0, ans=0.0 2024-08-10 15:58:19,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.945e+01 3.443e+01 4.072e+01 7.195e+01, threshold=6.886e+01, percent-clipped=1.0 2024-08-10 15:58:39,181 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 15:58:45,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=632120.0, ans=0.125 2024-08-10 15:58:51,611 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5250, loss[loss=0.1371, beats_loss=0.01044, ecapa_loss=0.0002366, whisper_loss=0.1243, over 21236.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01194, ecapa_loss=0.0002429, whisper_loss=0.09446, over 3864201.39 frames. ], batch size: 80, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:58:53,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=632220.0, ans=0.125 2024-08-10 15:59:07,326 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 15:59:12,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=632320.0, ans=0.125 2024-08-10 15:59:14,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=632320.0, ans=0.0 2024-08-10 15:59:26,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632420.0, ans=0.1 2024-08-10 15:59:42,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2024-08-10 15:59:46,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2024-08-10 15:59:57,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=632620.0, ans=0.125 2024-08-10 16:00:07,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5300, loss[loss=0.09587, beats_loss=0.0112, ecapa_loss=0.0002698, whisper_loss=0.08197, over 15170.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01181, ecapa_loss=0.000242, whisper_loss=0.09542, over 3879239.85 frames. ], batch size: 66, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:00:14,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=632720.0, ans=0.125 2024-08-10 16:00:18,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632720.0, ans=0.1 2024-08-10 16:00:19,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=632820.0, ans=0.0 2024-08-10 16:00:29,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.87 vs. limit=22.5 2024-08-10 16:00:33,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=632820.0, ans=0.125 2024-08-10 16:00:47,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.840e+01 3.204e+01 3.763e+01 6.547e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 16:00:50,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=633020.0, ans=0.2 2024-08-10 16:00:59,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=633020.0, ans=0.2 2024-08-10 16:01:11,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-10 16:01:11,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=633120.0, ans=0.125 2024-08-10 16:01:18,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5350, loss[loss=0.1122, beats_loss=0.01266, ecapa_loss=0.0002697, whisper_loss=0.09689, over 15439.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01189, ecapa_loss=0.0002423, whisper_loss=0.0944, over 3850550.76 frames. ], batch size: 64, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:01:25,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=633220.0, ans=0.025 2024-08-10 16:01:29,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=633220.0, ans=0.125 2024-08-10 16:01:39,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=633320.0, ans=0.1 2024-08-10 16:01:41,493 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 16:02:18,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=633620.0, ans=0.125 2024-08-10 16:02:28,556 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5400, loss[loss=0.1045, beats_loss=0.01127, ecapa_loss=0.0003562, whisper_loss=0.08964, over 20143.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01185, ecapa_loss=0.0002426, whisper_loss=0.09504, over 3851436.15 frames. ], batch size: 89, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:02:36,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=633720.0, ans=0.125 2024-08-10 16:02:40,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2024-08-10 16:02:44,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=633820.0, ans=0.1 2024-08-10 16:02:45,179 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 16:02:45,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=633820.0, ans=0.2 2024-08-10 16:02:45,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-08-10 16:02:49,662 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 16:02:51,402 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:02:55,244 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 16:02:55,511 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:03:01,147 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-10 16:03:01,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=633920.0, ans=0.125 2024-08-10 16:03:02,283 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-10 16:03:07,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.970e+01 3.287e+01 3.858e+01 5.350e+01, threshold=6.575e+01, percent-clipped=0.0 2024-08-10 16:03:20,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-10 16:03:23,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=634120.0, ans=0.0 2024-08-10 16:03:30,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=22.5 2024-08-10 16:03:30,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-10 16:03:37,450 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5450, loss[loss=0.1154, beats_loss=0.01082, ecapa_loss=0.000257, whisper_loss=0.102, over 22593.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01187, ecapa_loss=0.0002423, whisper_loss=0.09522, over 3875643.31 frames. ], batch size: 90, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:03:40,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=634220.0, ans=0.07 2024-08-10 16:03:41,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=12.0 2024-08-10 16:03:48,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=634220.0, ans=0.07 2024-08-10 16:03:53,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=634320.0, ans=0.0 2024-08-10 16:04:06,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634420.0, ans=0.1 2024-08-10 16:04:35,137 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 16:04:36,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=634620.0, ans=0.0 2024-08-10 16:04:41,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=634620.0, ans=0.05 2024-08-10 16:04:44,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5500, loss[loss=0.1114, beats_loss=0.01118, ecapa_loss=0.00022, whisper_loss=0.09803, over 21946.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01194, ecapa_loss=0.0002406, whisper_loss=0.09451, over 3878747.16 frames. ], batch size: 87, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:04:45,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=634720.0, ans=0.125 2024-08-10 16:04:46,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=634720.0, ans=0.0 2024-08-10 16:04:47,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-10 16:04:58,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=634820.0, ans=0.125 2024-08-10 16:04:59,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=634820.0, ans=0.125 2024-08-10 16:05:04,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=634820.0, ans=0.2 2024-08-10 16:05:04,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=634820.0, ans=0.125 2024-08-10 16:05:09,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=634820.0, ans=0.125 2024-08-10 16:05:11,412 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 16:05:11,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=634920.0, ans=0.125 2024-08-10 16:05:14,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-10 16:05:21,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=634920.0, ans=0.125 2024-08-10 16:05:22,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.955e+01 3.201e+01 3.849e+01 6.033e+01, threshold=6.402e+01, percent-clipped=0.0 2024-08-10 16:05:26,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=635020.0, ans=0.0 2024-08-10 16:05:33,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=635020.0, ans=0.125 2024-08-10 16:05:33,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2024-08-10 16:05:41,590 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 16:05:49,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=635120.0, ans=0.0 2024-08-10 16:05:52,868 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5550, loss[loss=0.1228, beats_loss=0.01038, ecapa_loss=0.0002194, whisper_loss=0.1102, over 19561.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01189, ecapa_loss=0.0002421, whisper_loss=0.09563, over 3929572.77 frames. ], batch size: 77, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:06:22,054 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 16:06:52,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=635620.0, ans=0.125 2024-08-10 16:06:58,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5600, loss[loss=0.1302, beats_loss=0.0108, ecapa_loss=0.0002117, whisper_loss=0.1172, over 23621.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01196, ecapa_loss=0.0002413, whisper_loss=0.09523, over 3921058.80 frames. ], batch size: 88, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:07:01,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=635720.0, ans=0.0 2024-08-10 16:07:08,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635720.0, ans=0.1 2024-08-10 16:07:09,261 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 14 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-10 16:07:12,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2024-08-10 16:07:14,659 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 16:07:16,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=635820.0, ans=0.2 2024-08-10 16:07:20,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=635820.0, ans=0.2 2024-08-10 16:07:30,534 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 16:07:35,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.708e+01 3.041e+01 3.496e+01 5.299e+01, threshold=6.081e+01, percent-clipped=0.0 2024-08-10 16:07:36,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=635920.0, ans=0.0 2024-08-10 16:07:38,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-10 16:08:01,110 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 16:08:04,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5650, loss[loss=0.111, beats_loss=0.01282, ecapa_loss=0.0002023, whisper_loss=0.09611, over 21923.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01205, ecapa_loss=0.0002411, whisper_loss=0.09393, over 3931714.55 frames. ], batch size: 85, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:08:43,744 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.504e+03 2024-08-10 16:08:46,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636520.0, ans=0.1 2024-08-10 16:09:03,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636620.0, ans=0.1 2024-08-10 16:09:10,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5700, loss[loss=0.1214, beats_loss=0.008379, ecapa_loss=0.0002196, whisper_loss=0.1108, over 15293.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01206, ecapa_loss=0.0002437, whisper_loss=0.09369, over 3918347.52 frames. ], batch size: 57, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:09:10,652 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 16:09:34,664 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 16:09:48,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.928e+01 3.301e+01 4.183e+01 7.157e+01, threshold=6.602e+01, percent-clipped=2.0 2024-08-10 16:09:48,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=636920.0, ans=0.0 2024-08-10 16:09:57,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2024-08-10 16:10:05,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=637120.0, ans=0.0 2024-08-10 16:10:13,553 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 16:10:19,240 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5750, loss[loss=0.1196, beats_loss=0.01003, ecapa_loss=0.0002035, whisper_loss=0.1076, over 15636.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01198, ecapa_loss=0.000244, whisper_loss=0.09454, over 3906236.90 frames. ], batch size: 57, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:10:23,613 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 16:10:59,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=637520.0, ans=0.125 2024-08-10 16:11:02,245 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 16:11:04,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=637520.0, ans=0.125 2024-08-10 16:11:15,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=637620.0, ans=0.125 2024-08-10 16:11:28,436 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5800, loss[loss=0.1237, beats_loss=0.01047, ecapa_loss=0.0002948, whisper_loss=0.1102, over 21663.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01191, ecapa_loss=0.0002442, whisper_loss=0.09475, over 3908000.72 frames. ], batch size: 88, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:11:30,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=637720.0, ans=0.125 2024-08-10 16:11:42,052 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 16:11:46,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=637820.0, ans=0.125 2024-08-10 16:11:52,752 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 16:12:01,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=637920.0, ans=0.125 2024-08-10 16:12:07,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=637920.0, ans=0.0 2024-08-10 16:12:07,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.715e+01 3.192e+01 3.464e+01 4.938e+01, threshold=6.385e+01, percent-clipped=0.0 2024-08-10 16:12:18,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2024-08-10 16:12:24,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=638120.0, ans=0.025 2024-08-10 16:12:38,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5850, loss[loss=0.1293, beats_loss=0.01207, ecapa_loss=0.0001948, whisper_loss=0.1152, over 17919.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01198, ecapa_loss=0.0002428, whisper_loss=0.09472, over 3904524.81 frames. ], batch size: 69, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:12:54,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=638320.0, ans=0.2 2024-08-10 16:13:01,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=638320.0, ans=0.125 2024-08-10 16:13:08,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=638420.0, ans=0.125 2024-08-10 16:13:18,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2024-08-10 16:13:22,290 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 16:13:26,468 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 16:13:33,319 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 16:13:35,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=638620.0, ans=0.5 2024-08-10 16:13:40,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=638620.0, ans=0.1 2024-08-10 16:13:44,173 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 16:13:48,178 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5900, loss[loss=0.09175, beats_loss=0.01375, ecapa_loss=0.0002867, whisper_loss=0.07514, over 20603.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01199, ecapa_loss=0.0002422, whisper_loss=0.09486, over 3902163.29 frames. ], batch size: 89, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:14:01,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.94 vs. limit=15.0 2024-08-10 16:14:25,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=638920.0, ans=0.0 2024-08-10 16:14:26,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.959e+01 3.304e+01 3.845e+01 6.831e+01, threshold=6.608e+01, percent-clipped=1.0 2024-08-10 16:14:37,907 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 16:14:40,514 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 11 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 16:14:43,151 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 16:14:51,533 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 16:14:52,965 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 16:14:56,599 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 5950, loss[loss=0.11, beats_loss=0.01343, ecapa_loss=0.0002218, whisper_loss=0.09431, over 23788.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01199, ecapa_loss=0.0002426, whisper_loss=0.0949, over 3921442.43 frames. ], batch size: 92, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:14:59,612 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 8 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 16:15:18,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=639320.0, ans=0.0 2024-08-10 16:15:23,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=639420.0, ans=0.0 2024-08-10 16:15:24,672 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 16:15:51,530 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 16:15:51,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=639620.0, ans=0.0 2024-08-10 16:16:07,867 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6000, loss[loss=0.1413, beats_loss=0.01168, ecapa_loss=0.0002244, whisper_loss=0.1274, over 23238.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01199, ecapa_loss=0.0002419, whisper_loss=0.09524, over 3927815.59 frames. ], batch size: 90, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:16:07,868 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 16:16:49,372 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on ASR_libri: loss=0.2642, beats_loss=0, ecapa_loss=0.0007414, whisper_loss=0.2567, over 922467.00 frames. 2024-08-10 16:17:08,493 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on SV_voxceleb1: loss=0.006164, beats_loss=0, ecapa_loss=0.0006164, whisper_loss=0, over 939242.00 frames. 2024-08-10 16:19:02,564 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on AT_audioset: loss=0.02682, beats_loss=0.02682, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 16:19:02,568 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 16:19:09,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639720.0, ans=0.1 2024-08-10 16:19:17,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=639820.0, ans=0.0 2024-08-10 16:19:26,745 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 16:19:38,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=639920.0, ans=0.125 2024-08-10 16:19:45,218 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.869e+01 3.209e+01 3.631e+01 6.157e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 16:19:55,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=640020.0, ans=0.125 2024-08-10 16:20:02,413 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.142e-02 2024-08-10 16:20:15,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6050, loss[loss=0.1113, beats_loss=0.0108, ecapa_loss=0.000251, whisper_loss=0.09798, over 18229.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01197, ecapa_loss=0.0002419, whisper_loss=0.09541, over 3903672.16 frames. ], batch size: 74, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:20:22,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-08-10 16:20:29,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-10 16:20:33,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640320.0, ans=0.1 2024-08-10 16:20:36,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2024-08-10 16:20:37,064 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 16:20:41,371 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 16:20:41,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=640320.0, ans=0.125 2024-08-10 16:20:47,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=640420.0, ans=0.125 2024-08-10 16:20:52,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=640420.0, ans=0.0 2024-08-10 16:20:55,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-08-10 16:21:02,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=640520.0, ans=0.125 2024-08-10 16:21:05,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2024-08-10 16:21:14,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2024-08-10 16:21:19,402 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 16:21:21,423 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 16:21:32,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6100, loss[loss=0.1121, beats_loss=0.01339, ecapa_loss=0.0002076, whisper_loss=0.0966, over 16805.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01199, ecapa_loss=0.0002416, whisper_loss=0.09538, over 3938697.71 frames. ], batch size: 66, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:21:32,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=640720.0, ans=10.0 2024-08-10 16:22:12,724 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 16:22:15,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.045e+01 3.489e+01 4.204e+01 8.442e+01, threshold=6.977e+01, percent-clipped=4.0 2024-08-10 16:22:26,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=641020.0, ans=0.125 2024-08-10 16:22:32,345 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 16:22:48,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6150, loss[loss=0.09475, beats_loss=0.01537, ecapa_loss=0.0002299, whisper_loss=0.07708, over 21862.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01204, ecapa_loss=0.0002418, whisper_loss=0.09513, over 3944800.03 frames. ], batch size: 94, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:23:25,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=641420.0, ans=0.1 2024-08-10 16:23:29,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-08-10 16:23:35,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=641520.0, ans=0.125 2024-08-10 16:23:42,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=641520.0, ans=0.07 2024-08-10 16:23:43,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=641520.0, ans=0.0 2024-08-10 16:23:45,098 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 16:23:58,189 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 39 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 16:24:02,639 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 16:24:03,852 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6200, loss[loss=0.07168, beats_loss=0.01252, ecapa_loss=0.000239, whisper_loss=0.05677, over 14769.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.012, ecapa_loss=0.0002413, whisper_loss=0.0953, over 3927264.66 frames. ], batch size: 58, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:24:18,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=641820.0, ans=0.2 2024-08-10 16:24:37,238 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 16:24:38,431 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 16:24:42,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.777e+01 3.185e+01 3.780e+01 9.777e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 16:24:47,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=642020.0, ans=0.0 2024-08-10 16:24:50,507 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 16:25:03,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=642120.0, ans=0.125 2024-08-10 16:25:16,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6250, loss[loss=0.085, beats_loss=0.01224, ecapa_loss=0.0002216, whisper_loss=0.07054, over 17447.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01195, ecapa_loss=0.0002409, whisper_loss=0.09506, over 3938337.19 frames. ], batch size: 71, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:25:38,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=642320.0, ans=0.125 2024-08-10 16:26:01,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=642520.0, ans=0.125 2024-08-10 16:26:17,692 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 16:26:18,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.23 vs. limit=22.5 2024-08-10 16:26:23,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-10 16:26:31,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6300, loss[loss=0.113, beats_loss=0.01394, ecapa_loss=0.0002601, whisper_loss=0.09651, over 22521.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01198, ecapa_loss=0.0002405, whisper_loss=0.09509, over 3916295.84 frames. ], batch size: 95, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:26:40,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=642720.0, ans=0.125 2024-08-10 16:26:42,861 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.936e-02 2024-08-10 16:27:14,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.923e+01 3.266e+01 3.609e+01 6.240e+01, threshold=6.531e+01, percent-clipped=0.0 2024-08-10 16:27:22,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=643020.0, ans=0.0 2024-08-10 16:27:42,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=643120.0, ans=0.2 2024-08-10 16:27:46,140 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6350, loss[loss=0.1202, beats_loss=0.01149, ecapa_loss=0.0001666, whisper_loss=0.107, over 14610.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01198, ecapa_loss=0.0002398, whisper_loss=0.09436, over 3904492.81 frames. ], batch size: 54, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:27:48,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.23 vs. limit=12.0 2024-08-10 16:27:49,249 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.130e-02 2024-08-10 16:27:50,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=643220.0, ans=0.015 2024-08-10 16:27:53,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=643220.0, ans=0.0 2024-08-10 16:27:53,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=643220.0, ans=0.125 2024-08-10 16:28:03,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=643320.0, ans=0.125 2024-08-10 16:28:08,597 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 16:28:15,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643420.0, ans=0.1 2024-08-10 16:28:16,630 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 16:28:21,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=643420.0, ans=0.07 2024-08-10 16:28:25,153 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 16:28:39,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=643520.0, ans=0.0 2024-08-10 16:28:55,375 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-10 16:28:55,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643620.0, ans=0.1 2024-08-10 16:28:57,867 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6400, loss[loss=0.09656, beats_loss=0.01283, ecapa_loss=0.0002138, whisper_loss=0.08159, over 20225.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01191, ecapa_loss=0.0002398, whisper_loss=0.09538, over 3913909.81 frames. ], batch size: 79, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:29:15,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=643820.0, ans=0.125 2024-08-10 16:29:22,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-08-10 16:29:24,998 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 16:29:36,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.802e+01 3.219e+01 3.654e+01 6.592e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-10 16:29:50,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-08-10 16:29:54,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644120.0, ans=0.1 2024-08-10 16:30:07,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6450, loss[loss=0.08474, beats_loss=0.01143, ecapa_loss=0.0002941, whisper_loss=0.07037, over 15013.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01188, ecapa_loss=0.0002402, whisper_loss=0.0956, over 3906325.94 frames. ], batch size: 62, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:30:07,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=644220.0, ans=0.125 2024-08-10 16:30:11,241 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 16:30:15,104 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-10 16:30:20,387 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 16:30:20,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=644320.0, ans=0.0 2024-08-10 16:30:20,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-10 16:30:26,915 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 16:30:53,085 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 16:31:00,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=644620.0, ans=0.125 2024-08-10 16:31:13,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=644720.0, ans=0.0 2024-08-10 16:31:14,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6500, loss[loss=0.1049, beats_loss=0.01129, ecapa_loss=0.0002386, whisper_loss=0.09126, over 22780.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01184, ecapa_loss=0.0002387, whisper_loss=0.09621, over 3916110.69 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:31:16,763 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 16:31:20,583 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 16:31:22,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-10 16:31:40,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=644820.0, ans=0.0 2024-08-10 16:31:44,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=644920.0, ans=0.2 2024-08-10 16:31:46,886 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 16:31:52,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=644920.0, ans=0.125 2024-08-10 16:31:53,162 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.031e+01 3.251e+01 3.712e+01 6.418e+01, threshold=6.501e+01, percent-clipped=0.0 2024-08-10 16:31:55,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=645020.0, ans=0.1 2024-08-10 16:32:01,314 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 16:32:01,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645020.0, ans=0.1 2024-08-10 16:32:23,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6550, loss[loss=0.1235, beats_loss=0.01316, ecapa_loss=0.0002567, whisper_loss=0.1078, over 22101.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01188, ecapa_loss=0.0002403, whisper_loss=0.09623, over 3932437.61 frames. ], batch size: 91, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:32:30,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.46 vs. limit=15.0 2024-08-10 16:32:34,919 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 13 from LS+wenet, 31 from Vox, 20 fro AS 2024-08-10 16:32:52,832 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 16:33:31,334 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 16:33:32,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6600, loss[loss=0.09575, beats_loss=0.01342, ecapa_loss=0.000259, whisper_loss=0.07974, over 21708.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01194, ecapa_loss=0.0002406, whisper_loss=0.09587, over 3956574.67 frames. ], batch size: 94, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:33:35,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=645720.0, ans=0.0 2024-08-10 16:33:45,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=645820.0, ans=0.125 2024-08-10 16:33:53,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645820.0, ans=0.1 2024-08-10 16:34:08,156 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 16:34:11,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.864e+01 3.355e+01 3.985e+01 6.693e+01, threshold=6.710e+01, percent-clipped=1.0 2024-08-10 16:34:34,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=646120.0, ans=0.125 2024-08-10 16:34:38,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-10 16:34:43,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-10 16:34:43,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6650, loss[loss=0.1127, beats_loss=0.01484, ecapa_loss=0.0002046, whisper_loss=0.09585, over 22710.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01189, ecapa_loss=0.0002405, whisper_loss=0.09588, over 3953829.89 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:34:45,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=646220.0, ans=0.0 2024-08-10 16:34:49,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=646220.0, ans=0.0 2024-08-10 16:34:56,283 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:35:21,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=646420.0, ans=0.125 2024-08-10 16:35:25,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=646520.0, ans=0.07 2024-08-10 16:35:28,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646520.0, ans=0.1 2024-08-10 16:35:37,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646520.0, ans=0.1 2024-08-10 16:35:55,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6700, loss[loss=0.1078, beats_loss=0.01138, ecapa_loss=0.0001907, whisper_loss=0.09446, over 22514.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0119, ecapa_loss=0.0002385, whisper_loss=0.09584, over 3954459.97 frames. ], batch size: 86, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:36:17,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-08-10 16:36:21,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=646820.0, ans=0.05 2024-08-10 16:36:22,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=646920.0, ans=0.025 2024-08-10 16:36:34,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.879e+01 3.180e+01 3.709e+01 5.171e+01, threshold=6.361e+01, percent-clipped=0.0 2024-08-10 16:36:46,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=647020.0, ans=0.0 2024-08-10 16:36:49,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=647020.0, ans=0.0 2024-08-10 16:37:03,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=647120.0, ans=0.1 2024-08-10 16:37:03,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.00 vs. limit=15.0 2024-08-10 16:37:04,314 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 32 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 16:37:05,438 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6750, loss[loss=0.1368, beats_loss=0.009224, ecapa_loss=0.0002155, whisper_loss=0.1254, over 18952.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0118, ecapa_loss=0.0002394, whisper_loss=0.09631, over 3904897.29 frames. ], batch size: 72, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:37:06,811 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 16:37:31,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=647420.0, ans=0.125 2024-08-10 16:37:36,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647420.0, ans=0.1 2024-08-10 16:37:38,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=647420.0, ans=0.125 2024-08-10 16:37:51,796 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 16:37:52,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2024-08-10 16:38:06,551 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 16:38:12,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6800, loss[loss=0.08835, beats_loss=0.01234, ecapa_loss=0.0002474, whisper_loss=0.07353, over 18527.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.0119, ecapa_loss=0.0002391, whisper_loss=0.09613, over 3923800.56 frames. ], batch size: 76, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:38:20,917 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 16:38:44,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=15.0 2024-08-10 16:38:46,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=647920.0, ans=0.125 2024-08-10 16:38:53,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.892e+01 3.322e+01 4.059e+01 7.063e+01, threshold=6.643e+01, percent-clipped=1.0 2024-08-10 16:39:06,058 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:39:22,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=648220.0, ans=0.1 2024-08-10 16:39:22,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6850, loss[loss=0.09494, beats_loss=0.01348, ecapa_loss=0.0002632, whisper_loss=0.07883, over 18806.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01183, ecapa_loss=0.0002393, whisper_loss=0.09581, over 3894927.07 frames. ], batch size: 78, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:39:38,522 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 16:39:40,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=648320.0, ans=0.07 2024-08-10 16:39:44,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=648320.0, ans=0.125 2024-08-10 16:39:53,560 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 16:40:10,579 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 16:40:11,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=648520.0, ans=0.125 2024-08-10 16:40:24,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=648620.0, ans=0.125 2024-08-10 16:40:31,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6900, loss[loss=0.09671, beats_loss=0.01281, ecapa_loss=0.0002334, whisper_loss=0.08157, over 20643.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01192, ecapa_loss=0.0002396, whisper_loss=0.09483, over 3925205.76 frames. ], batch size: 84, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:40:34,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-10 16:40:34,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2024-08-10 16:40:51,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=648820.0, ans=0.2 2024-08-10 16:40:58,699 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 16:40:59,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=648920.0, ans=0.2 2024-08-10 16:41:00,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2024-08-10 16:41:10,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.858e+01 3.304e+01 3.695e+01 5.634e+01, threshold=6.608e+01, percent-clipped=0.0 2024-08-10 16:41:34,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-10 16:41:40,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 6950, loss[loss=0.08408, beats_loss=0.0155, ecapa_loss=0.0002055, whisper_loss=0.06652, over 21879.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.0119, ecapa_loss=0.0002399, whisper_loss=0.09516, over 3911039.92 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:41:40,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=649220.0, ans=0.2 2024-08-10 16:41:41,787 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 16:41:54,760 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-10 16:41:58,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=649320.0, ans=0.125 2024-08-10 16:42:03,510 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-10 16:42:18,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=649420.0, ans=0.125 2024-08-10 16:42:19,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=649420.0, ans=0.0 2024-08-10 16:42:32,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.53 vs. limit=22.5 2024-08-10 16:42:47,923 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 16:42:49,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7000, loss[loss=0.1149, beats_loss=0.01232, ecapa_loss=0.0002317, whisper_loss=0.1003, over 20579.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01188, ecapa_loss=0.00024, whisper_loss=0.0949, over 3911866.97 frames. ], batch size: 83, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:42:52,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.51 vs. limit=10.0 2024-08-10 16:43:14,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=649920.0, ans=0.0 2024-08-10 16:43:18,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=649920.0, ans=0.0 2024-08-10 16:43:22,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2024-08-10 16:43:25,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.800e+01 3.263e+01 3.998e+01 9.402e+01, threshold=6.527e+01, percent-clipped=1.0 2024-08-10 16:43:34,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-10 16:43:35,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=650020.0, ans=0.125 2024-08-10 16:43:36,008 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 16:43:50,012 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 16:43:55,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7050, loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0002813, whisper_loss=0.09078, over 16271.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01188, ecapa_loss=0.0002396, whisper_loss=0.0949, over 3917582.35 frames. ], batch size: 69, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:43:55,242 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 16:44:01,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=650220.0, ans=0.2 2024-08-10 16:44:18,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=650320.0, ans=0.125 2024-08-10 16:44:41,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=650520.0, ans=0.125 2024-08-10 16:44:42,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=650520.0, ans=0.0 2024-08-10 16:44:51,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=650620.0, ans=0.125 2024-08-10 16:45:00,573 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 16:45:01,633 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7100, loss[loss=0.104, beats_loss=0.01286, ecapa_loss=0.0002814, whisper_loss=0.08835, over 20432.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0119, ecapa_loss=0.0002394, whisper_loss=0.09422, over 3887297.95 frames. ], batch size: 86, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:45:06,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=650720.0, ans=0.125 2024-08-10 16:45:21,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=650820.0, ans=0.05 2024-08-10 16:45:22,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=650820.0, ans=0.125 2024-08-10 16:45:32,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-10 16:45:35,526 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 16:45:39,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.645e+01 3.161e+01 3.535e+01 5.692e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 16:46:04,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=651120.0, ans=0.2 2024-08-10 16:46:08,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7150, loss[loss=0.09348, beats_loss=0.01286, ecapa_loss=0.0002194, whisper_loss=0.07842, over 22573.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01191, ecapa_loss=0.0002376, whisper_loss=0.09381, over 3905289.80 frames. ], batch size: 93, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:46:13,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=651220.0, ans=0.2 2024-08-10 16:46:14,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=651220.0, ans=0.125 2024-08-10 16:46:14,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=651220.0, ans=0.125 2024-08-10 16:46:22,491 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 16:46:34,870 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 16:46:36,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=651420.0, ans=0.125 2024-08-10 16:46:50,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=651520.0, ans=0.125 2024-08-10 16:46:57,873 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 16:47:17,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7200, loss[loss=0.1203, beats_loss=0.01147, ecapa_loss=0.0002712, whisper_loss=0.1061, over 22015.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01187, ecapa_loss=0.0002377, whisper_loss=0.0943, over 3923126.73 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:47:29,108 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 16:47:44,522 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 16:47:56,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.798e+01 3.257e+01 4.005e+01 1.167e+02, threshold=6.513e+01, percent-clipped=2.0 2024-08-10 16:47:56,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=651920.0, ans=0.125 2024-08-10 16:48:13,745 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 16:48:26,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7250, loss[loss=0.1321, beats_loss=0.007699, ecapa_loss=0.0003037, whisper_loss=0.1213, over 21513.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01184, ecapa_loss=0.0002396, whisper_loss=0.09472, over 3931382.36 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:48:56,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=652420.0, ans=0.1 2024-08-10 16:49:28,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=652620.0, ans=0.1 2024-08-10 16:49:36,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=652620.0, ans=0.125 2024-08-10 16:49:38,483 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7300, loss[loss=0.09977, beats_loss=0.01157, ecapa_loss=0.0002787, whisper_loss=0.08542, over 22566.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01181, ecapa_loss=0.0002407, whisper_loss=0.09527, over 3933718.61 frames. ], batch size: 91, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:49:40,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-10 16:50:16,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.741e+01 3.048e+01 3.552e+01 4.958e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-10 16:50:28,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=653020.0, ans=0.2 2024-08-10 16:50:37,685 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 16:50:44,321 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 16:50:46,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7350, loss[loss=0.1231, beats_loss=0.006885, ecapa_loss=0.000324, whisper_loss=0.113, over 16670.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01176, ecapa_loss=0.0002419, whisper_loss=0.09558, over 3929979.84 frames. ], batch size: 66, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:50:56,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653220.0, ans=0.1 2024-08-10 16:51:11,268 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 16:51:22,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=653420.0, ans=0.125 2024-08-10 16:51:25,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.29 vs. limit=15.0 2024-08-10 16:51:26,054 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 16:51:34,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=653520.0, ans=0.125 2024-08-10 16:51:41,908 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.338e+05 2024-08-10 16:51:44,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=653620.0, ans=0.05 2024-08-10 16:51:52,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=653620.0, ans=0.2 2024-08-10 16:51:57,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7400, loss[loss=0.1019, beats_loss=0.01445, ecapa_loss=0.0002126, whisper_loss=0.0853, over 22219.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01184, ecapa_loss=0.0002419, whisper_loss=0.09562, over 3920181.19 frames. ], batch size: 91, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:52:01,342 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 16:52:04,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2024-08-10 16:52:25,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=653920.0, ans=0.125 2024-08-10 16:52:29,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-10 16:52:30,368 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 16:52:35,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.867e+01 3.212e+01 3.714e+01 5.750e+01, threshold=6.424e+01, percent-clipped=0.0 2024-08-10 16:52:54,868 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:53:04,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7450, loss[loss=0.1263, beats_loss=0.01154, ecapa_loss=0.000255, whisper_loss=0.1122, over 19052.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01189, ecapa_loss=0.0002416, whisper_loss=0.09503, over 3916263.80 frames. ], batch size: 73, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:53:27,674 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 16:53:43,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-10 16:53:53,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-08-10 16:53:56,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=654620.0, ans=0.125 2024-08-10 16:53:58,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=28.05 vs. limit=22.5 2024-08-10 16:54:11,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7500, loss[loss=0.1231, beats_loss=0.0114, ecapa_loss=0.000246, whisper_loss=0.1092, over 21938.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01182, ecapa_loss=0.0002415, whisper_loss=0.09541, over 3922849.30 frames. ], batch size: 88, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:54:12,529 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 16:54:12,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=654720.0, ans=0.0 2024-08-10 16:54:16,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=654720.0, ans=0.0 2024-08-10 16:54:36,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654920.0, ans=0.1 2024-08-10 16:54:48,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.920e+01 3.227e+01 3.863e+01 6.212e+01, threshold=6.454e+01, percent-clipped=0.0 2024-08-10 16:54:48,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=654920.0, ans=0.125 2024-08-10 16:54:55,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=655020.0, ans=0.09899494936611666 2024-08-10 16:55:17,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7550, loss[loss=0.121, beats_loss=0.0117, ecapa_loss=0.0002452, whisper_loss=0.1068, over 16978.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01191, ecapa_loss=0.0002411, whisper_loss=0.09443, over 3897812.51 frames. ], batch size: 64, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:55:17,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=655220.0, ans=0.0 2024-08-10 16:55:30,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=655320.0, ans=0.125 2024-08-10 16:55:32,394 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 16:55:34,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=655320.0, ans=0.125 2024-08-10 16:55:41,381 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 16:55:46,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-10 16:55:55,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=655420.0, ans=0.125 2024-08-10 16:56:01,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=655520.0, ans=0.0 2024-08-10 16:56:07,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=655520.0, ans=0.125 2024-08-10 16:56:24,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7600, loss[loss=0.1063, beats_loss=0.009624, ecapa_loss=0.0003301, whisper_loss=0.09336, over 13110.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01184, ecapa_loss=0.000243, whisper_loss=0.09447, over 3870638.04 frames. ], batch size: 54, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:56:25,604 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 16:56:37,858 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 16:56:45,832 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 16:57:01,093 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 16:57:02,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.834e+01 3.413e+01 3.883e+01 8.700e+01, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 16:57:28,143 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 16:57:31,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7650, loss[loss=0.09151, beats_loss=0.01291, ecapa_loss=0.000209, whisper_loss=0.07651, over 21277.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01178, ecapa_loss=0.0002442, whisper_loss=0.09473, over 3869683.55 frames. ], batch size: 84, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:57:49,229 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-10 16:57:59,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=656420.0, ans=0.025 2024-08-10 16:58:08,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-08-10 16:58:15,587 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 16:58:20,506 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 34 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-10 16:58:23,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=656620.0, ans=0.2 2024-08-10 16:58:37,476 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7700, loss[loss=0.1008, beats_loss=0.01322, ecapa_loss=0.0002141, whisper_loss=0.08545, over 15602.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01184, ecapa_loss=0.0002432, whisper_loss=0.09448, over 3879791.28 frames. ], batch size: 60, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:58:39,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2024-08-10 16:58:46,861 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 37 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 16:58:48,429 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 16:58:48,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=656720.0, ans=0.125 2024-08-10 16:58:53,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=656820.0, ans=0.0 2024-08-10 16:58:58,379 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 16:59:03,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.60 vs. limit=22.5 2024-08-10 16:59:15,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.911e+01 3.362e+01 3.849e+01 6.405e+01, threshold=6.723e+01, percent-clipped=0.0 2024-08-10 16:59:16,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=656920.0, ans=15.0 2024-08-10 16:59:21,795 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 16:59:25,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=657020.0, ans=0.125 2024-08-10 16:59:29,724 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 16:59:30,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2024-08-10 16:59:33,951 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 16:59:35,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=657120.0, ans=0.125 2024-08-10 16:59:43,194 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 16:59:44,377 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7750, loss[loss=0.09154, beats_loss=0.01351, ecapa_loss=0.0002824, whisper_loss=0.07521, over 17470.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01178, ecapa_loss=0.0002416, whisper_loss=0.09477, over 3870526.26 frames. ], batch size: 76, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:00:03,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-08-10 17:00:15,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=657420.0, ans=0.125 2024-08-10 17:00:18,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=657420.0, ans=0.125 2024-08-10 17:00:20,198 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 17:00:35,873 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 17:00:43,479 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 17:00:49,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657620.0, ans=0.1 2024-08-10 17:00:51,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7800, loss[loss=0.1008, beats_loss=0.008752, ecapa_loss=0.0002724, whisper_loss=0.08929, over 20704.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01185, ecapa_loss=0.0002409, whisper_loss=0.09423, over 3886119.70 frames. ], batch size: 80, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:01:28,039 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.784e+01 3.058e+01 3.552e+01 6.431e+01, threshold=6.115e+01, percent-clipped=0.0 2024-08-10 17:01:32,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=658020.0, ans=0.2 2024-08-10 17:01:40,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=658020.0, ans=0.0 2024-08-10 17:01:43,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=658120.0, ans=0.2 2024-08-10 17:01:46,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=658120.0, ans=0.125 2024-08-10 17:01:53,493 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 17:01:57,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7850, loss[loss=0.08515, beats_loss=0.0124, ecapa_loss=0.0002811, whisper_loss=0.06994, over 19477.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01193, ecapa_loss=0.00024, whisper_loss=0.09372, over 3883578.50 frames. ], batch size: 85, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:02:04,091 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 17:02:05,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=658220.0, ans=0.1 2024-08-10 17:02:05,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=658220.0, ans=0.1 2024-08-10 17:02:31,223 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-10 17:02:51,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=658620.0, ans=0.125 2024-08-10 17:02:59,698 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-10 17:03:05,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7900, loss[loss=0.114, beats_loss=0.0108, ecapa_loss=0.0002537, whisper_loss=0.1007, over 20835.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01197, ecapa_loss=0.0002393, whisper_loss=0.09412, over 3903532.67 frames. ], batch size: 83, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:03:05,182 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 17:03:19,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=658820.0, ans=0.025 2024-08-10 17:03:34,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2024-08-10 17:03:37,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=658920.0, ans=0.0 2024-08-10 17:03:42,958 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.841e+01 3.204e+01 3.801e+01 5.785e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 17:04:08,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=659120.0, ans=0.0 2024-08-10 17:04:09,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=659120.0, ans=0.125 2024-08-10 17:04:12,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 7950, loss[loss=0.1096, beats_loss=0.01182, ecapa_loss=0.0002211, whisper_loss=0.09558, over 19907.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01197, ecapa_loss=0.0002384, whisper_loss=0.09411, over 3921674.13 frames. ], batch size: 81, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:04:14,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=659220.0, ans=0.0 2024-08-10 17:04:17,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=659220.0, ans=0.125 2024-08-10 17:04:27,456 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 17:04:34,239 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 17:04:36,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-08-10 17:04:39,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=659420.0, ans=0.0 2024-08-10 17:05:05,848 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 17:05:19,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8000, loss[loss=0.08962, beats_loss=0.01179, ecapa_loss=0.0002161, whisper_loss=0.07567, over 14930.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01191, ecapa_loss=0.0002375, whisper_loss=0.09434, over 3915741.32 frames. ], batch size: 58, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:05:30,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=659820.0, ans=0.09899494936611666 2024-08-10 17:05:32,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-10 17:05:40,280 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 17:05:41,593 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 17:05:43,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=659820.0, ans=0.2 2024-08-10 17:05:44,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=659920.0, ans=0.125 2024-08-10 17:05:51,952 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 17:05:55,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.761e+01 3.157e+01 3.536e+01 5.933e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 17:05:58,699 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 17:06:02,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=660020.0, ans=0.125 2024-08-10 17:06:02,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=660020.0, ans=0.125 2024-08-10 17:06:15,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660120.0, ans=0.1 2024-08-10 17:06:25,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8050, loss[loss=0.1073, beats_loss=0.01375, ecapa_loss=0.000178, whisper_loss=0.09174, over 21927.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01186, ecapa_loss=0.0002384, whisper_loss=0.09463, over 3879152.90 frames. ], batch size: 86, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:06:26,872 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 17:06:31,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660220.0, ans=0.1 2024-08-10 17:06:35,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=660220.0, ans=0.125 2024-08-10 17:06:37,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=660320.0, ans=0.0 2024-08-10 17:06:44,366 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 17:06:46,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=660320.0, ans=0.125 2024-08-10 17:06:47,083 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 17:06:47,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=660320.0, ans=0.125 2024-08-10 17:06:54,847 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 17:07:03,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=660420.0, ans=0.2 2024-08-10 17:07:14,667 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 17:07:26,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=660620.0, ans=0.1 2024-08-10 17:07:27,163 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 17:07:32,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8100, loss[loss=0.1092, beats_loss=0.01048, ecapa_loss=0.0002825, whisper_loss=0.09592, over 14129.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01196, ecapa_loss=0.0002372, whisper_loss=0.09427, over 3889114.54 frames. ], batch size: 57, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:07:45,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=660820.0, ans=0.125 2024-08-10 17:07:48,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660820.0, ans=0.1 2024-08-10 17:07:53,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=660820.0, ans=0.125 2024-08-10 17:08:00,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=660920.0, ans=0.05 2024-08-10 17:08:08,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=660920.0, ans=0.0 2024-08-10 17:08:09,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 3.015e+01 3.254e+01 3.867e+01 1.141e+02, threshold=6.509e+01, percent-clipped=2.0 2024-08-10 17:08:13,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=661020.0, ans=0.125 2024-08-10 17:08:20,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-10 17:08:25,461 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 17:08:28,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=661120.0, ans=0.125 2024-08-10 17:08:37,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=15.0 2024-08-10 17:08:38,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8150, loss[loss=0.1149, beats_loss=0.01228, ecapa_loss=0.0002641, whisper_loss=0.09998, over 15894.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01191, ecapa_loss=0.0002383, whisper_loss=0.09407, over 3893662.36 frames. ], batch size: 64, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:08:40,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-10 17:08:49,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.03 vs. limit=10.0 2024-08-10 17:08:51,169 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 17:09:15,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=661420.0, ans=0.125 2024-08-10 17:09:18,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=661520.0, ans=0.0 2024-08-10 17:09:20,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=661520.0, ans=0.125 2024-08-10 17:09:24,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=661520.0, ans=0.125 2024-08-10 17:09:31,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=661620.0, ans=0.1 2024-08-10 17:09:40,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2024-08-10 17:09:45,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8200, loss[loss=0.1118, beats_loss=0.01008, ecapa_loss=0.0002933, whisper_loss=0.09874, over 19439.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01194, ecapa_loss=0.0002379, whisper_loss=0.09413, over 3886976.20 frames. ], batch size: 78, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:09:48,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=661720.0, ans=0.0 2024-08-10 17:09:58,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-08-10 17:10:03,866 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 17:10:10,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=661920.0, ans=0.0 2024-08-10 17:10:15,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661920.0, ans=0.1 2024-08-10 17:10:17,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=661920.0, ans=0.2 2024-08-10 17:10:19,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=661920.0, ans=0.0 2024-08-10 17:10:21,907 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.877e+01 3.347e+01 3.681e+01 6.491e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-10 17:10:29,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=12.0 2024-08-10 17:10:32,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=662020.0, ans=0.2 2024-08-10 17:10:50,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8250, loss[loss=0.1046, beats_loss=0.0115, ecapa_loss=0.0002433, whisper_loss=0.09065, over 20031.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01194, ecapa_loss=0.0002376, whisper_loss=0.09451, over 3905474.63 frames. ], batch size: 80, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:11:00,659 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 17:11:03,305 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 17:11:10,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=662320.0, ans=0.0 2024-08-10 17:11:24,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=662420.0, ans=0.04949747468305833 2024-08-10 17:11:27,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=662420.0, ans=0.05 2024-08-10 17:11:45,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-08-10 17:11:47,371 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 17:11:48,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662620.0, ans=0.1 2024-08-10 17:11:54,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=662620.0, ans=0.0 2024-08-10 17:11:57,502 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8300, loss[loss=0.09405, beats_loss=0.01199, ecapa_loss=0.0001843, whisper_loss=0.08022, over 16445.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01192, ecapa_loss=0.0002387, whisper_loss=0.09459, over 3915437.88 frames. ], batch size: 62, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:11:57,650 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 17:11:57,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=662720.0, ans=0.125 2024-08-10 17:12:01,993 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 17:12:02,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=662720.0, ans=0.025 2024-08-10 17:12:12,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=662820.0, ans=0.0 2024-08-10 17:12:26,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=662920.0, ans=0.05 2024-08-10 17:12:29,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-08-10 17:12:34,965 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.908e+01 3.363e+01 4.143e+01 6.461e+01, threshold=6.726e+01, percent-clipped=0.0 2024-08-10 17:12:38,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=663020.0, ans=0.0 2024-08-10 17:12:40,306 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 17:12:45,709 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 17:12:46,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=663020.0, ans=0.125 2024-08-10 17:12:48,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=663020.0, ans=0.2 2024-08-10 17:12:58,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=663120.0, ans=0.125 2024-08-10 17:12:59,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663120.0, ans=0.125 2024-08-10 17:13:04,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8350, loss[loss=0.1042, beats_loss=0.01358, ecapa_loss=0.0002239, whisper_loss=0.0884, over 21961.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01189, ecapa_loss=0.0002383, whisper_loss=0.0945, over 3907239.93 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:13:27,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2024-08-10 17:13:52,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=15.0 2024-08-10 17:14:15,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=663720.0, ans=0.0 2024-08-10 17:14:15,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8400, loss[loss=0.08176, beats_loss=0.01488, ecapa_loss=0.0001761, whisper_loss=0.06512, over 14191.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01187, ecapa_loss=0.0002375, whisper_loss=0.09445, over 3887074.84 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:14:25,232 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 17:14:26,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=663720.0, ans=0.0 2024-08-10 17:14:36,136 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 17:14:52,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=663920.0, ans=0.125 2024-08-10 17:14:56,193 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.836e+01 3.172e+01 3.671e+01 5.154e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 17:15:06,012 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 17:15:11,877 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 17:15:17,155 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 17:15:17,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=664120.0, ans=0.125 2024-08-10 17:15:23,181 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 17:15:25,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664120.0, ans=0.1 2024-08-10 17:15:29,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8450, loss[loss=0.1072, beats_loss=0.01084, ecapa_loss=0.0002662, whisper_loss=0.09368, over 21194.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01187, ecapa_loss=0.0002371, whisper_loss=0.09413, over 3899759.99 frames. ], batch size: 88, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:15:31,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2024-08-10 17:15:40,913 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.491e+05 2024-08-10 17:15:42,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=664220.0, ans=0.125 2024-08-10 17:15:42,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664220.0, ans=0.125 2024-08-10 17:15:53,575 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 17:15:53,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=664320.0, ans=0.125 2024-08-10 17:15:56,344 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 17:16:00,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=664420.0, ans=0.125 2024-08-10 17:16:02,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=664420.0, ans=0.0 2024-08-10 17:16:05,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=664420.0, ans=0.0 2024-08-10 17:16:09,027 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 17:16:11,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=664420.0, ans=0.125 2024-08-10 17:16:34,141 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 17:16:42,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8500, loss[loss=0.1095, beats_loss=0.009146, ecapa_loss=0.0002769, whisper_loss=0.09756, over 15839.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01194, ecapa_loss=0.0002349, whisper_loss=0.09353, over 3898643.67 frames. ], batch size: 63, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:16:54,251 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 17:17:00,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=664820.0, ans=0.125 2024-08-10 17:17:14,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=664920.0, ans=0.125 2024-08-10 17:17:26,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.856e+01 3.264e+01 3.786e+01 7.141e+01, threshold=6.528e+01, percent-clipped=1.0 2024-08-10 17:17:39,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665020.0, ans=0.125 2024-08-10 17:17:55,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665120.0, ans=0.1 2024-08-10 17:18:00,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8550, loss[loss=0.1192, beats_loss=0.01154, ecapa_loss=0.0002809, whisper_loss=0.1049, over 22581.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01186, ecapa_loss=0.0002343, whisper_loss=0.09407, over 3917979.33 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:18:06,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=22.5 2024-08-10 17:18:21,021 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-10 17:18:33,920 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 17:18:48,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2024-08-10 17:19:05,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=665620.0, ans=0.125 2024-08-10 17:19:08,351 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 17:19:13,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=665620.0, ans=0.0 2024-08-10 17:19:16,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8600, loss[loss=0.1076, beats_loss=0.01179, ecapa_loss=0.000314, whisper_loss=0.0927, over 21594.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01173, ecapa_loss=0.0002368, whisper_loss=0.09547, over 3884737.19 frames. ], batch size: 93, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:19:16,794 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-10 17:19:34,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=665820.0, ans=0.1 2024-08-10 17:19:54,036 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 17:20:05,143 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.897e+01 3.260e+01 3.635e+01 5.528e+01, threshold=6.520e+01, percent-clipped=0.0 2024-08-10 17:20:05,973 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 17:20:21,792 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 27 from LS+wenet, 19 from Vox, 14 fro AS 2024-08-10 17:20:41,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8650, loss[loss=0.1325, beats_loss=0.01205, ecapa_loss=0.0002393, whisper_loss=0.1181, over 20895.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01179, ecapa_loss=0.0002364, whisper_loss=0.09543, over 3866014.27 frames. ], batch size: 82, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:20:46,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2024-08-10 17:20:48,729 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-10 17:20:58,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=666320.0, ans=0.125 2024-08-10 17:21:04,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=666320.0, ans=0.125 2024-08-10 17:21:11,998 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.227e+05 2024-08-10 17:21:19,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=666420.0, ans=0.05 2024-08-10 17:21:24,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.69 vs. limit=22.5 2024-08-10 17:21:25,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=666420.0, ans=0.0 2024-08-10 17:21:49,319 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 17:21:53,289 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 17:21:56,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=666620.0, ans=0.0 2024-08-10 17:21:59,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=666620.0, ans=0.0 2024-08-10 17:22:14,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8700, loss[loss=0.1329, beats_loss=0.009662, ecapa_loss=0.0002396, whisper_loss=0.1209, over 15969.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01178, ecapa_loss=0.0002373, whisper_loss=0.09496, over 3840446.26 frames. ], batch size: 58, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:23:16,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.246e+01 2.997e+01 3.503e+01 4.044e+01 1.535e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 17:23:16,792 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 17:23:22,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2024-08-10 17:23:24,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2024-08-10 17:23:30,606 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 17:23:30,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=667020.0, ans=0.0 2024-08-10 17:23:33,106 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 17:23:46,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=667120.0, ans=0.07 2024-08-10 17:23:58,406 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8750, loss[loss=0.1175, beats_loss=0.008351, ecapa_loss=0.0002614, whisper_loss=0.1065, over 19952.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01183, ecapa_loss=0.0002371, whisper_loss=0.09531, over 3837030.79 frames. ], batch size: 78, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:24:13,921 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 17:24:23,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-10 17:24:23,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-10 17:24:38,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=667320.0, ans=0.0 2024-08-10 17:24:41,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667320.0, ans=0.1 2024-08-10 17:24:50,242 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 17:24:53,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=667420.0, ans=0.0 2024-08-10 17:25:02,053 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-10 17:25:10,434 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-10 17:25:14,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=667520.0, ans=0.2 2024-08-10 17:25:28,620 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 17:25:57,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8800, loss[loss=0.09961, beats_loss=0.01165, ecapa_loss=0.0002183, whisper_loss=0.08578, over 20107.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01194, ecapa_loss=0.0002359, whisper_loss=0.09423, over 3853614.19 frames. ], batch size: 82, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:26:04,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=667720.0, ans=0.0 2024-08-10 17:26:10,143 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 17:26:13,854 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 17:26:20,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-08-10 17:26:32,511 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-10 17:27:04,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.842e+01 3.119e+01 3.570e+01 8.103e+01, threshold=6.239e+01, percent-clipped=1.0 2024-08-10 17:28:03,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8850, loss[loss=0.1156, beats_loss=0.009697, ecapa_loss=0.0002815, whisper_loss=0.1031, over 16091.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01189, ecapa_loss=0.0002365, whisper_loss=0.0943, over 3857362.93 frames. ], batch size: 66, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:28:46,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2024-08-10 17:29:08,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-08-10 17:29:09,713 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 17:29:12,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-10 17:29:22,131 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 17:29:39,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=668620.0, ans=0.0 2024-08-10 17:29:40,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=668620.0, ans=0.125 2024-08-10 17:29:53,713 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8900, loss[loss=0.09609, beats_loss=0.01245, ecapa_loss=0.0002324, whisper_loss=0.08132, over 20948.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01182, ecapa_loss=0.0002367, whisper_loss=0.095, over 3883252.96 frames. ], batch size: 84, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:29:55,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=668720.0, ans=0.125 2024-08-10 17:29:57,945 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 17:30:04,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=668720.0, ans=0.07 2024-08-10 17:30:10,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-08-10 17:30:13,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=668820.0, ans=0.2 2024-08-10 17:30:36,282 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.696e+01 3.082e+01 3.587e+01 7.840e+01, threshold=6.164e+01, percent-clipped=1.0 2024-08-10 17:31:11,414 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 8950, loss[loss=0.1003, beats_loss=0.009796, ecapa_loss=0.0003437, whisper_loss=0.08707, over 19655.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01186, ecapa_loss=0.0002369, whisper_loss=0.09394, over 3876464.89 frames. ], batch size: 84, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:31:18,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=669220.0, ans=0.125 2024-08-10 17:31:19,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=669220.0, ans=0.125 2024-08-10 17:31:22,939 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 17:31:30,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=669320.0, ans=0.125 2024-08-10 17:31:43,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=669420.0, ans=0.0 2024-08-10 17:31:54,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=669420.0, ans=0.0 2024-08-10 17:32:19,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669620.0, ans=0.125 2024-08-10 17:32:23,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=669620.0, ans=0.02 2024-08-10 17:32:28,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9000, loss[loss=0.1001, beats_loss=0.0116, ecapa_loss=0.0002237, whisper_loss=0.08623, over 21603.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01185, ecapa_loss=0.0002375, whisper_loss=0.09425, over 3873332.34 frames. ], batch size: 85, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:32:28,728 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 17:33:04,116 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on ASR_libri: loss=0.2625, beats_loss=0, ecapa_loss=0.0007367, whisper_loss=0.2551, over 922467.00 frames. 2024-08-10 17:33:17,545 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7525, 3.8389, 4.3582, 4.2565], device='cuda:3') 2024-08-10 17:33:20,327 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on SV_voxceleb1: loss=0.006282, beats_loss=0, ecapa_loss=0.0006282, whisper_loss=0, over 939242.00 frames. 2024-08-10 17:34:42,583 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5672, 2.7333, 2.9768, 2.6944], device='cuda:3') 2024-08-10 17:35:05,189 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on AT_audioset: loss=0.02673, beats_loss=0.02673, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 17:35:05,193 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 17:35:11,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2024-08-10 17:35:26,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=669820.0, ans=0.0 2024-08-10 17:35:26,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=669820.0, ans=0.125 2024-08-10 17:35:27,644 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 17:35:38,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=669920.0, ans=0.0 2024-08-10 17:35:47,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.797e+01 3.113e+01 3.593e+01 8.640e+01, threshold=6.226e+01, percent-clipped=2.0 2024-08-10 17:36:09,845 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 17:36:11,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=670120.0, ans=0.125 2024-08-10 17:36:14,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=670120.0, ans=0.125 2024-08-10 17:36:21,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9050, loss[loss=0.07754, beats_loss=0.01713, ecapa_loss=0.0001753, whisper_loss=0.05866, over 17765.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01177, ecapa_loss=0.0002376, whisper_loss=0.09465, over 3833235.00 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:36:23,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670220.0, ans=0.1 2024-08-10 17:36:25,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=670220.0, ans=0.125 2024-08-10 17:36:37,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=670320.0, ans=0.125 2024-08-10 17:36:42,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670320.0, ans=0.1 2024-08-10 17:37:00,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=670420.0, ans=0.125 2024-08-10 17:37:00,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670420.0, ans=0.1 2024-08-10 17:37:11,034 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-10 17:37:32,992 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 17:37:35,516 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9100, loss[loss=0.07576, beats_loss=0.01732, ecapa_loss=0.0002229, whisper_loss=0.0562, over 20669.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01181, ecapa_loss=0.0002392, whisper_loss=0.0945, over 3863119.54 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:37:38,396 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-10 17:37:39,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=670720.0, ans=0.125 2024-08-10 17:37:48,870 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 17:37:49,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=670820.0, ans=0.0 2024-08-10 17:37:56,677 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 17:38:01,840 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 17:38:05,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2024-08-10 17:38:16,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.910e+01 3.253e+01 3.723e+01 6.048e+01, threshold=6.507e+01, percent-clipped=0.0 2024-08-10 17:38:20,871 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 17:38:33,010 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 17:38:45,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=671120.0, ans=0.125 2024-08-10 17:38:49,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9150, loss[loss=0.1222, beats_loss=0.0088, ecapa_loss=0.0002596, whisper_loss=0.1108, over 15651.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01179, ecapa_loss=0.0002391, whisper_loss=0.09464, over 3856399.56 frames. ], batch size: 61, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:39:00,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=671220.0, ans=0.125 2024-08-10 17:39:12,007 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 17:39:30,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=671420.0, ans=0.0 2024-08-10 17:39:34,243 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-10 17:39:37,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=671520.0, ans=0.125 2024-08-10 17:39:38,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2024-08-10 17:39:49,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=671520.0, ans=0.09899494936611666 2024-08-10 17:40:09,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9200, loss[loss=0.1238, beats_loss=0.0107, ecapa_loss=0.0002313, whisper_loss=0.1108, over 23087.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01188, ecapa_loss=0.0002381, whisper_loss=0.09413, over 3891636.66 frames. ], batch size: 92, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:40:19,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-10 17:40:20,666 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 17:40:44,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=671920.0, ans=0.125 2024-08-10 17:40:53,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.732e+01 3.100e+01 3.483e+01 6.432e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 17:41:07,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=672020.0, ans=0.125 2024-08-10 17:41:18,412 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 17:41:20,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=672120.0, ans=0.2 2024-08-10 17:41:21,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=672120.0, ans=0.0 2024-08-10 17:41:27,130 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9250, loss[loss=0.1028, beats_loss=0.01419, ecapa_loss=0.0002048, whisper_loss=0.08659, over 18508.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01193, ecapa_loss=0.0002365, whisper_loss=0.09388, over 3869290.13 frames. ], batch size: 75, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:41:31,961 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 17:41:32,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=672220.0, ans=0.0 2024-08-10 17:41:48,415 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 17:41:56,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=672320.0, ans=0.125 2024-08-10 17:42:01,152 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 17:42:01,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=672420.0, ans=0.125 2024-08-10 17:42:03,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=672420.0, ans=0.125 2024-08-10 17:42:11,539 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 17:42:12,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=672520.0, ans=0.2 2024-08-10 17:42:12,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-10 17:42:14,722 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 17:42:29,719 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 17:42:40,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=672620.0, ans=0.125 2024-08-10 17:42:42,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9300, loss[loss=0.1041, beats_loss=0.01132, ecapa_loss=0.0002952, whisper_loss=0.08986, over 14954.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01187, ecapa_loss=0.0002373, whisper_loss=0.09429, over 3897433.16 frames. ], batch size: 62, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:42:57,841 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-10 17:43:03,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=672820.0, ans=0.0 2024-08-10 17:43:08,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-10 17:43:11,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2024-08-10 17:43:12,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-08-10 17:43:14,352 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 17:43:14,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=672920.0, ans=0.09899494936611666 2024-08-10 17:43:20,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=672920.0, ans=0.0 2024-08-10 17:43:28,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.985e+01 3.331e+01 3.923e+01 7.099e+01, threshold=6.662e+01, percent-clipped=2.0 2024-08-10 17:43:39,015 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 17:43:42,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=673020.0, ans=0.0 2024-08-10 17:44:05,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9350, loss[loss=0.1038, beats_loss=0.0112, ecapa_loss=0.000244, whisper_loss=0.09017, over 17333.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002373, whisper_loss=0.09486, over 3909389.99 frames. ], batch size: 68, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:44:10,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-08-10 17:44:12,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2024-08-10 17:44:13,098 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.684e-02 2024-08-10 17:44:20,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=673220.0, ans=0.2 2024-08-10 17:44:35,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=673320.0, ans=0.2 2024-08-10 17:44:39,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=673420.0, ans=0.125 2024-08-10 17:44:40,763 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 17:44:42,494 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 17:44:47,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673420.0, ans=0.1 2024-08-10 17:44:56,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=673520.0, ans=0.125 2024-08-10 17:45:07,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=673620.0, ans=0.0 2024-08-10 17:45:17,824 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-10 17:45:22,852 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9400, loss[loss=0.1046, beats_loss=0.01404, ecapa_loss=0.0002624, whisper_loss=0.08795, over 21941.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01179, ecapa_loss=0.0002383, whisper_loss=0.09509, over 3916884.52 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:45:24,841 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 13 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 17:45:26,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-10 17:45:40,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=673820.0, ans=0.125 2024-08-10 17:45:47,017 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 17:46:05,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.795e+01 3.115e+01 3.725e+01 7.083e+01, threshold=6.231e+01, percent-clipped=1.0 2024-08-10 17:46:24,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674120.0, ans=0.0 2024-08-10 17:46:36,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9450, loss[loss=0.1026, beats_loss=0.01516, ecapa_loss=0.0002242, whisper_loss=0.08519, over 22992.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01185, ecapa_loss=0.0002385, whisper_loss=0.0942, over 3893027.54 frames. ], batch size: 94, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:46:43,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=674220.0, ans=0.125 2024-08-10 17:46:56,396 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 17:47:22,861 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 17:47:28,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.92 vs. limit=22.5 2024-08-10 17:47:48,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9500, loss[loss=0.1192, beats_loss=0.01124, ecapa_loss=0.0002067, whisper_loss=0.1059, over 19320.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01172, ecapa_loss=0.0002384, whisper_loss=0.09544, over 3887341.68 frames. ], batch size: 73, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:47:49,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=674720.0, ans=0.1 2024-08-10 17:47:53,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-10 17:47:59,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=674720.0, ans=0.125 2024-08-10 17:48:04,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=674820.0, ans=0.2 2024-08-10 17:48:11,390 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 17:48:12,636 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 17:48:13,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.95 vs. limit=10.0 2024-08-10 17:48:14,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=674820.0, ans=0.0 2024-08-10 17:48:23,919 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-10 17:48:31,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-10 17:48:33,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.877e+01 3.250e+01 3.723e+01 7.953e+01, threshold=6.499e+01, percent-clipped=3.0 2024-08-10 17:48:34,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=674920.0, ans=0.0 2024-08-10 17:48:39,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-10 17:48:58,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=675120.0, ans=0.04949747468305833 2024-08-10 17:48:58,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=675120.0, ans=0.0 2024-08-10 17:49:05,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9550, loss[loss=0.117, beats_loss=0.01101, ecapa_loss=0.0002224, whisper_loss=0.1038, over 18193.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01175, ecapa_loss=0.0002382, whisper_loss=0.09504, over 3881911.89 frames. ], batch size: 72, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:49:06,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=675220.0, ans=0.5 2024-08-10 17:49:20,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=675320.0, ans=0.125 2024-08-10 17:49:23,079 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 17:49:40,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=675420.0, ans=0.1 2024-08-10 17:49:42,045 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 17:49:44,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=675420.0, ans=0.2 2024-08-10 17:49:45,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=675420.0, ans=0.0 2024-08-10 17:49:52,669 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 17:50:00,720 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 17:50:16,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2024-08-10 17:50:21,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9600, loss[loss=0.1124, beats_loss=0.01283, ecapa_loss=0.000216, whisper_loss=0.09739, over 22464.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01174, ecapa_loss=0.0002396, whisper_loss=0.09495, over 3858376.86 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:50:22,948 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 17:50:25,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=675720.0, ans=0.125 2024-08-10 17:50:36,039 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-10 17:50:39,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-08-10 17:50:44,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=675820.0, ans=0.02 2024-08-10 17:50:54,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-10 17:51:02,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.698e+01 2.997e+01 3.348e+01 4.884e+01, threshold=5.995e+01, percent-clipped=0.0 2024-08-10 17:51:18,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=676120.0, ans=0.2 2024-08-10 17:51:28,429 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 17:51:30,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=676120.0, ans=0.2 2024-08-10 17:51:32,450 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9650, loss[loss=0.1052, beats_loss=0.01289, ecapa_loss=0.000186, whisper_loss=0.09047, over 18291.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01187, ecapa_loss=0.0002379, whisper_loss=0.09398, over 3807777.17 frames. ], batch size: 68, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:51:40,217 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 17:51:41,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=676220.0, ans=0.2 2024-08-10 17:51:47,760 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 17:51:55,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-10 17:52:33,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.23 vs. limit=15.0 2024-08-10 17:52:38,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=676620.0, ans=0.125 2024-08-10 17:52:45,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9700, loss[loss=0.1226, beats_loss=0.01014, ecapa_loss=0.0002516, whisper_loss=0.11, over 22626.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01181, ecapa_loss=0.0002394, whisper_loss=0.09391, over 3827607.23 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:52:45,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=676720.0, ans=0.125 2024-08-10 17:52:47,462 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 17:52:51,394 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-10 17:52:54,272 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 17:52:55,831 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 17:53:12,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=676820.0, ans=0.125 2024-08-10 17:53:20,981 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 17:53:22,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=676920.0, ans=0.125 2024-08-10 17:53:25,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=676920.0, ans=0.0 2024-08-10 17:53:27,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.851e+01 3.065e+01 3.509e+01 5.015e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 17:53:32,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-10 17:53:33,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=677020.0, ans=0.0 2024-08-10 17:53:40,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=677020.0, ans=0.0 2024-08-10 17:53:44,433 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 17:53:47,083 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 17:53:50,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=677120.0, ans=0.025 2024-08-10 17:53:59,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9750, loss[loss=0.1299, beats_loss=0.008373, ecapa_loss=0.0002724, whisper_loss=0.1188, over 18058.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01186, ecapa_loss=0.000238, whisper_loss=0.09389, over 3817672.84 frames. ], batch size: 74, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:54:51,976 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 17:55:01,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=677620.0, ans=0.125 2024-08-10 17:55:03,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=677620.0, ans=0.0 2024-08-10 17:55:12,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9800, loss[loss=0.09819, beats_loss=0.01481, ecapa_loss=0.000183, whisper_loss=0.08155, over 18620.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01185, ecapa_loss=0.0002354, whisper_loss=0.0942, over 3830988.95 frames. ], batch size: 74, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:55:15,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=677720.0, ans=0.125 2024-08-10 17:55:32,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677820.0, ans=0.1 2024-08-10 17:55:37,163 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 17:55:46,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=677920.0, ans=0.2 2024-08-10 17:55:50,262 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 17:55:54,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.700e+01 3.065e+01 3.596e+01 6.450e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-10 17:55:56,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.38 vs. limit=6.0 2024-08-10 17:56:01,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=678020.0, ans=0.04949747468305833 2024-08-10 17:56:09,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=678020.0, ans=0.125 2024-08-10 17:56:20,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=678120.0, ans=0.1 2024-08-10 17:56:25,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=678220.0, ans=0.125 2024-08-10 17:56:25,530 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.549e+05 2024-08-10 17:56:25,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-08-10 17:56:26,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9850, loss[loss=0.1145, beats_loss=0.009048, ecapa_loss=0.0002561, whisper_loss=0.1029, over 21078.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01186, ecapa_loss=0.0002355, whisper_loss=0.09438, over 3830732.16 frames. ], batch size: 84, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:56:30,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=678220.0, ans=0.1 2024-08-10 17:56:53,433 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 17:57:12,279 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 17:57:12,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=678520.0, ans=0.0 2024-08-10 17:57:41,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9900, loss[loss=0.09659, beats_loss=0.0132, ecapa_loss=0.000258, whisper_loss=0.08081, over 21834.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01193, ecapa_loss=0.0002336, whisper_loss=0.09366, over 3850647.91 frames. ], batch size: 95, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:57:42,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2024-08-10 17:57:47,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=678720.0, ans=0.0 2024-08-10 17:57:55,796 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 17:57:55,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=678820.0, ans=0.125 2024-08-10 17:57:58,303 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 17:58:02,463 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 17:58:19,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.724e+01 3.027e+01 3.695e+01 5.994e+01, threshold=6.053e+01, percent-clipped=0.0 2024-08-10 17:58:40,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-10 17:58:42,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=679120.0, ans=0.0 2024-08-10 17:58:47,436 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 17:58:50,511 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 9950, loss[loss=0.1209, beats_loss=0.008445, ecapa_loss=0.0002916, whisper_loss=0.1096, over 22185.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0118, ecapa_loss=0.0002364, whisper_loss=0.09464, over 3854269.45 frames. ], batch size: 85, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:58:52,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=679220.0, ans=0.2 2024-08-10 17:59:05,530 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 17:59:14,200 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 17:59:16,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679320.0, ans=0.1 2024-08-10 17:59:27,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=679420.0, ans=0.125 2024-08-10 17:59:40,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-10 17:59:45,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=679520.0, ans=0.125 2024-08-10 17:59:45,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-08-10 17:59:46,385 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 17:59:57,116 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.552e-01 2024-08-10 18:00:04,315 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10000, loss[loss=0.118, beats_loss=0.01251, ecapa_loss=0.0001941, whisper_loss=0.1036, over 23642.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01171, ecapa_loss=0.0002389, whisper_loss=0.09512, over 3832221.63 frames. ], batch size: 93, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 18:00:05,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=679720.0, ans=0.04949747468305833 2024-08-10 18:00:17,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2024-08-10 18:00:21,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=679820.0, ans=0.0 2024-08-10 18:00:30,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=679820.0, ans=0.125 2024-08-10 18:00:37,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=679920.0, ans=0.125 2024-08-10 18:00:47,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.793e+01 3.118e+01 3.876e+01 5.816e+01, threshold=6.237e+01, percent-clipped=0.0 2024-08-10 18:00:47,539 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 18:00:54,362 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 18:00:57,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=680020.0, ans=0.125 2024-08-10 18:01:03,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=680120.0, ans=0.125 2024-08-10 18:01:14,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=680120.0, ans=0.0 2024-08-10 18:01:16,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=680220.0, ans=0.0 2024-08-10 18:01:17,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10050, loss[loss=0.09334, beats_loss=0.01424, ecapa_loss=0.0001895, whisper_loss=0.07721, over 16809.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01174, ecapa_loss=0.0002376, whisper_loss=0.09448, over 3832180.99 frames. ], batch size: 65, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:01:30,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=680320.0, ans=0.04949747468305833 2024-08-10 18:01:31,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-08-10 18:01:33,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=680320.0, ans=0.125 2024-08-10 18:01:40,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=680320.0, ans=0.125 2024-08-10 18:01:57,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=680420.0, ans=0.125 2024-08-10 18:02:12,031 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 18:02:30,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10100, loss[loss=0.1117, beats_loss=0.01042, ecapa_loss=0.0002471, whisper_loss=0.09879, over 21229.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01171, ecapa_loss=0.0002381, whisper_loss=0.09505, over 3861476.79 frames. ], batch size: 84, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:02:41,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-10 18:03:12,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.905e+01 3.182e+01 3.646e+01 5.979e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-10 18:03:21,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681020.0, ans=0.1 2024-08-10 18:03:31,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=681020.0, ans=15.0 2024-08-10 18:03:43,895 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 18:03:48,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10150, loss[loss=0.1092, beats_loss=0.01369, ecapa_loss=0.0002081, whisper_loss=0.09348, over 23083.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01179, ecapa_loss=0.0002375, whisper_loss=0.09499, over 3861389.94 frames. ], batch size: 90, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:03:52,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=681220.0, ans=0.1 2024-08-10 18:03:53,270 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 18:03:56,632 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 18:04:03,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=681220.0, ans=0.05 2024-08-10 18:04:06,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681320.0, ans=0.1 2024-08-10 18:04:16,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=15.0 2024-08-10 18:04:21,270 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-10 18:04:21,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681420.0, ans=0.1 2024-08-10 18:04:33,469 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 18:04:36,919 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=9.782e-02 2024-08-10 18:04:44,716 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 18:04:55,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=681620.0, ans=0.0 2024-08-10 18:04:56,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-08-10 18:05:01,722 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 18:05:03,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2024-08-10 18:05:09,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10200, loss[loss=0.102, beats_loss=0.01084, ecapa_loss=0.0003027, whisper_loss=0.08813, over 21085.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01179, ecapa_loss=0.0002376, whisper_loss=0.09523, over 3891220.13 frames. ], batch size: 94, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:05:25,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=681820.0, ans=0.125 2024-08-10 18:05:30,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=681820.0, ans=0.125 2024-08-10 18:05:30,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=681820.0, ans=0.2 2024-08-10 18:05:34,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=10.0 2024-08-10 18:05:40,304 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 18:05:54,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.852e+01 3.122e+01 3.821e+01 7.643e+01, threshold=6.244e+01, percent-clipped=3.0 2024-08-10 18:05:55,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=681920.0, ans=0.125 2024-08-10 18:06:14,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=682120.0, ans=0.0 2024-08-10 18:06:28,071 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10250, loss[loss=0.1183, beats_loss=0.01104, ecapa_loss=0.0001858, whisper_loss=0.1054, over 17450.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01175, ecapa_loss=0.0002374, whisper_loss=0.09544, over 3893691.98 frames. ], batch size: 65, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:06:31,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=682220.0, ans=0.0 2024-08-10 18:06:37,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2024-08-10 18:07:08,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=682420.0, ans=0.0 2024-08-10 18:07:09,413 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 18:07:19,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=682520.0, ans=0.125 2024-08-10 18:07:19,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=682520.0, ans=0.125 2024-08-10 18:07:20,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2024-08-10 18:07:33,975 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 18:07:34,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=682620.0, ans=0.125 2024-08-10 18:07:37,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=682620.0, ans=0.0 2024-08-10 18:07:37,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=682620.0, ans=0.0 2024-08-10 18:07:46,378 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10300, loss[loss=0.1043, beats_loss=0.01194, ecapa_loss=0.0002739, whisper_loss=0.08961, over 22964.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01179, ecapa_loss=0.0002359, whisper_loss=0.09466, over 3861625.07 frames. ], batch size: 95, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:08:10,002 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-10 18:08:16,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=682920.0, ans=0.125 2024-08-10 18:08:19,310 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 18:08:29,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.976e+01 3.282e+01 3.794e+01 5.948e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 18:08:37,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=683020.0, ans=0.125 2024-08-10 18:08:54,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-10 18:08:58,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683120.0, ans=0.0 2024-08-10 18:08:59,889 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 18:09:02,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10350, loss[loss=0.1117, beats_loss=0.01328, ecapa_loss=0.0002517, whisper_loss=0.09589, over 22063.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01179, ecapa_loss=0.0002368, whisper_loss=0.09466, over 3903084.01 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:09:17,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=683320.0, ans=0.1 2024-08-10 18:09:23,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2024-08-10 18:09:28,724 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 18:09:31,979 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-10 18:09:32,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.22 vs. limit=10.0 2024-08-10 18:09:48,142 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 18:09:48,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683520.0, ans=0.1 2024-08-10 18:09:56,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=683520.0, ans=0.07 2024-08-10 18:10:13,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=15.0 2024-08-10 18:10:14,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683620.0, ans=0.1 2024-08-10 18:10:15,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=683620.0, ans=0.0 2024-08-10 18:10:20,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10400, loss[loss=0.06941, beats_loss=0.01483, ecapa_loss=0.0002938, whisper_loss=0.05164, over 12669.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01177, ecapa_loss=0.0002361, whisper_loss=0.095, over 3883820.77 frames. ], batch size: 57, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:10:20,948 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 18:10:41,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-10 18:11:01,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-10 18:11:02,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.804e+01 3.184e+01 3.674e+01 7.007e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 18:11:20,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=684120.0, ans=0.05 2024-08-10 18:11:22,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-10 18:11:25,237 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 18:11:34,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10450, loss[loss=0.1237, beats_loss=0.009564, ecapa_loss=0.0002629, whisper_loss=0.1115, over 22947.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01176, ecapa_loss=0.0002359, whisper_loss=0.09508, over 3860276.59 frames. ], batch size: 93, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:12:06,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=684420.0, ans=0.125 2024-08-10 18:12:17,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2024-08-10 18:12:32,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=684520.0, ans=0.125 2024-08-10 18:12:43,841 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 18:12:45,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=684620.0, ans=0.125 2024-08-10 18:12:54,227 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10500, loss[loss=0.1026, beats_loss=0.01281, ecapa_loss=0.0002585, whisper_loss=0.08717, over 17381.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01182, ecapa_loss=0.0002351, whisper_loss=0.09484, over 3839727.53 frames. ], batch size: 69, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:13:00,805 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 18:13:09,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-10 18:13:17,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=684820.0, ans=0.125 2024-08-10 18:13:24,253 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 18:13:27,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-10 18:13:35,853 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.920e+01 3.144e+01 3.815e+01 6.100e+01, threshold=6.288e+01, percent-clipped=0.0 2024-08-10 18:13:36,011 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 18:13:36,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=684920.0, ans=0.0 2024-08-10 18:13:50,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=685020.0, ans=0.0 2024-08-10 18:13:59,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=685120.0, ans=0.0 2024-08-10 18:14:01,809 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 18:14:09,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10550, loss[loss=0.1193, beats_loss=0.01001, ecapa_loss=0.0002934, whisper_loss=0.1063, over 22194.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01183, ecapa_loss=0.0002356, whisper_loss=0.09473, over 3836827.45 frames. ], batch size: 92, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:14:29,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-10 18:14:32,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=685320.0, ans=0.0 2024-08-10 18:14:40,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=685420.0, ans=0.125 2024-08-10 18:14:44,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=685420.0, ans=0.125 2024-08-10 18:14:50,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=685420.0, ans=0.2 2024-08-10 18:14:58,528 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 18:15:28,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10600, loss[loss=0.09938, beats_loss=0.01232, ecapa_loss=0.0001834, whisper_loss=0.08523, over 22091.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01178, ecapa_loss=0.000235, whisper_loss=0.09437, over 3825842.47 frames. ], batch size: 88, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:15:31,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=685720.0, ans=0.0 2024-08-10 18:15:33,775 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 18:16:08,169 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 18:16:12,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.764e+01 3.108e+01 3.489e+01 4.887e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-10 18:16:21,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686020.0, ans=0.1 2024-08-10 18:16:24,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=686020.0, ans=0.05 2024-08-10 18:16:46,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10650, loss[loss=0.1166, beats_loss=0.01061, ecapa_loss=0.0002298, whisper_loss=0.1037, over 22204.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01183, ecapa_loss=0.0002345, whisper_loss=0.09417, over 3833300.06 frames. ], batch size: 87, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:16:51,811 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 18:16:55,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=22.5 2024-08-10 18:16:59,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=686220.0, ans=0.0 2024-08-10 18:17:33,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-10 18:17:39,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=686520.0, ans=0.0 2024-08-10 18:17:40,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=686520.0, ans=0.125 2024-08-10 18:17:55,137 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 18:18:04,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10700, loss[loss=0.1064, beats_loss=0.01162, ecapa_loss=0.0002421, whisper_loss=0.0924, over 20371.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01184, ecapa_loss=0.0002329, whisper_loss=0.09427, over 3868975.64 frames. ], batch size: 83, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:18:15,463 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 18:18:47,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.883e+01 3.231e+01 3.765e+01 5.379e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-10 18:18:58,099 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 18:19:19,710 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 14 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 18:19:23,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10750, loss[loss=0.1035, beats_loss=0.01301, ecapa_loss=0.0002006, whisper_loss=0.08849, over 22977.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01183, ecapa_loss=0.0002335, whisper_loss=0.09484, over 3872732.07 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:19:33,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=687220.0, ans=0.125 2024-08-10 18:19:36,680 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 18:19:38,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=687320.0, ans=0.125 2024-08-10 18:19:47,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=687320.0, ans=0.0 2024-08-10 18:20:08,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687520.0, ans=0.1 2024-08-10 18:20:18,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=687520.0, ans=0.125 2024-08-10 18:20:25,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=687620.0, ans=0.125 2024-08-10 18:20:28,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=12.0 2024-08-10 18:20:33,960 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 18:20:40,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10800, loss[loss=0.08863, beats_loss=0.01177, ecapa_loss=0.0002218, whisper_loss=0.07464, over 17369.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01177, ecapa_loss=0.0002338, whisper_loss=0.09498, over 3858679.48 frames. ], batch size: 71, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:20:41,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=687720.0, ans=0.1 2024-08-10 18:20:55,810 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 18:21:07,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=687820.0, ans=0.0 2024-08-10 18:21:10,934 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 18:21:22,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=687920.0, ans=0.0 2024-08-10 18:21:23,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 2.760e+01 3.130e+01 3.473e+01 5.037e+01, threshold=6.260e+01, percent-clipped=0.0 2024-08-10 18:21:31,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=688020.0, ans=0.125 2024-08-10 18:21:37,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=688020.0, ans=0.1 2024-08-10 18:21:49,498 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 18:21:50,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2024-08-10 18:21:51,873 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 18:21:57,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10850, loss[loss=0.1076, beats_loss=0.01194, ecapa_loss=0.0002526, whisper_loss=0.09311, over 21690.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01178, ecapa_loss=0.0002329, whisper_loss=0.09584, over 3903836.80 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:21:59,680 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 18:22:05,336 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-10 18:22:28,914 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 18:22:29,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=688420.0, ans=0.125 2024-08-10 18:22:30,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=688420.0, ans=0.125 2024-08-10 18:22:32,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=688420.0, ans=0.0 2024-08-10 18:22:41,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688420.0, ans=0.1 2024-08-10 18:22:47,749 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 18:22:52,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-10 18:23:04,184 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 18:23:15,033 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10900, loss[loss=0.1174, beats_loss=0.01362, ecapa_loss=0.0001813, whisper_loss=0.1019, over 23519.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01164, ecapa_loss=0.0002337, whisper_loss=0.09725, over 3912058.59 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:23:44,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=688820.0, ans=0.125 2024-08-10 18:23:45,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-08-10 18:24:02,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.842e+01 3.313e+01 3.977e+01 6.808e+01, threshold=6.627e+01, percent-clipped=2.0 2024-08-10 18:24:02,352 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 18:24:12,818 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 18:24:20,657 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 18:24:36,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 10950, loss[loss=0.1178, beats_loss=0.009561, ecapa_loss=0.0002904, whisper_loss=0.1053, over 21779.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01174, ecapa_loss=0.0002308, whisper_loss=0.09565, over 3900112.77 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:24:44,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=689220.0, ans=0.125 2024-08-10 18:24:47,490 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 18:24:47,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=689220.0, ans=0.07 2024-08-10 18:24:50,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=689320.0, ans=0.125 2024-08-10 18:24:56,386 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 18:25:06,160 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 18:25:08,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=689420.0, ans=0.125 2024-08-10 18:25:19,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=689420.0, ans=0.125 2024-08-10 18:25:26,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-10 18:25:33,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-08-10 18:25:45,730 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 31 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 18:25:55,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11000, loss[loss=0.1212, beats_loss=0.007225, ecapa_loss=0.0002905, whisper_loss=0.111, over 15944.00 frames. ], tot_loss[loss=0.11, beats_loss=0.0117, ecapa_loss=0.0002338, whisper_loss=0.09598, over 3914821.69 frames. ], batch size: 64, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:25:56,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=689720.0, ans=0.2 2024-08-10 18:25:59,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=689720.0, ans=0.125 2024-08-10 18:26:00,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=689720.0, ans=0.125 2024-08-10 18:26:08,935 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 18:26:19,722 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 18:26:21,127 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 18:26:23,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-10 18:26:32,990 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 18:26:41,079 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+01 2.857e+01 3.230e+01 3.620e+01 6.298e+01, threshold=6.460e+01, percent-clipped=0.0 2024-08-10 18:26:51,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=690020.0, ans=0.125 2024-08-10 18:27:10,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=690120.0, ans=0.07 2024-08-10 18:27:16,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11050, loss[loss=0.126, beats_loss=0.009172, ecapa_loss=0.0002662, whisper_loss=0.1141, over 19793.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01172, ecapa_loss=0.0002334, whisper_loss=0.09549, over 3919981.76 frames. ], batch size: 76, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:27:31,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=690220.0, ans=0.2 2024-08-10 18:27:37,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-10 18:27:44,093 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 18:27:48,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-08-10 18:27:48,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.07 vs. limit=15.0 2024-08-10 18:28:10,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690520.0, ans=0.0 2024-08-10 18:28:14,963 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 18:28:20,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=690620.0, ans=10.0 2024-08-10 18:28:20,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2024-08-10 18:28:29,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=690620.0, ans=0.125 2024-08-10 18:28:34,433 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 18:28:36,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11100, loss[loss=0.1227, beats_loss=0.009837, ecapa_loss=0.0002491, whisper_loss=0.1104, over 16009.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01172, ecapa_loss=0.0002332, whisper_loss=0.09502, over 3889848.33 frames. ], batch size: 64, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:29:00,609 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 18:29:18,975 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.710e+01 3.196e+01 3.800e+01 5.125e+01, threshold=6.392e+01, percent-clipped=0.0 2024-08-10 18:29:21,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=691020.0, ans=0.0 2024-08-10 18:29:32,213 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 18:29:36,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-08-10 18:29:54,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11150, loss[loss=0.108, beats_loss=0.01287, ecapa_loss=0.000208, whisper_loss=0.093, over 23383.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01172, ecapa_loss=0.0002337, whisper_loss=0.09497, over 3896136.98 frames. ], batch size: 92, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:30:15,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=691320.0, ans=0.125 2024-08-10 18:30:18,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=691320.0, ans=0.0 2024-08-10 18:30:23,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=691320.0, ans=0.2 2024-08-10 18:30:30,654 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 36 from Vox, 31 fro AS 2024-08-10 18:30:56,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=691520.0, ans=0.0 2024-08-10 18:30:59,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=691620.0, ans=0.0 2024-08-10 18:31:14,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11200, loss[loss=0.09661, beats_loss=0.01174, ecapa_loss=0.0001789, whisper_loss=0.08308, over 14478.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01163, ecapa_loss=0.0002327, whisper_loss=0.09502, over 3889485.07 frames. ], batch size: 53, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:31:37,336 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 18:31:56,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.779e+01 3.196e+01 3.588e+01 6.419e+01, threshold=6.392e+01, percent-clipped=1.0 2024-08-10 18:31:57,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=691920.0, ans=0.125 2024-08-10 18:32:04,228 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 18:32:07,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=692020.0, ans=0.125 2024-08-10 18:32:07,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=692020.0, ans=0.125 2024-08-10 18:32:25,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-10 18:32:28,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=692120.0, ans=0.125 2024-08-10 18:32:31,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11250, loss[loss=0.1121, beats_loss=0.00866, ecapa_loss=0.0002729, whisper_loss=0.1007, over 19394.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01167, ecapa_loss=0.0002336, whisper_loss=0.09484, over 3869421.57 frames. ], batch size: 79, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:32:44,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-10 18:32:44,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2024-08-10 18:32:45,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=692220.0, ans=0.125 2024-08-10 18:33:01,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=692320.0, ans=0.2 2024-08-10 18:33:02,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=692420.0, ans=0.125 2024-08-10 18:33:03,953 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 18:33:10,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=692420.0, ans=0.1 2024-08-10 18:33:10,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=692420.0, ans=0.1 2024-08-10 18:33:17,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=692520.0, ans=0.125 2024-08-10 18:33:51,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11300, loss[loss=0.09659, beats_loss=0.01377, ecapa_loss=0.0002037, whisper_loss=0.08078, over 21116.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01174, ecapa_loss=0.0002305, whisper_loss=0.09509, over 3892883.68 frames. ], batch size: 84, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:33:56,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=692720.0, ans=0.125 2024-08-10 18:33:57,355 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 18:34:08,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=692820.0, ans=0.2 2024-08-10 18:34:15,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=692820.0, ans=0.125 2024-08-10 18:34:34,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=692920.0, ans=0.05 2024-08-10 18:34:35,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.891e+01 3.346e+01 3.835e+01 5.621e+01, threshold=6.692e+01, percent-clipped=0.0 2024-08-10 18:34:55,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=693120.0, ans=0.125 2024-08-10 18:35:01,201 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 18:35:05,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=693120.0, ans=0.125 2024-08-10 18:35:05,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2024-08-10 18:35:06,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.63 vs. limit=10.0 2024-08-10 18:35:07,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=693120.0, ans=0.0 2024-08-10 18:35:08,080 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 18:35:08,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=693220.0, ans=0.0 2024-08-10 18:35:09,109 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11350, loss[loss=0.1256, beats_loss=0.007444, ecapa_loss=0.0002744, whisper_loss=0.1154, over 22425.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01164, ecapa_loss=0.0002323, whisper_loss=0.09544, over 3875750.16 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:35:24,903 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 18:35:43,046 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 18:35:44,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=693420.0, ans=0.125 2024-08-10 18:35:48,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-08-10 18:35:52,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-10 18:36:00,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=15.0 2024-08-10 18:36:06,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693520.0, ans=0.125 2024-08-10 18:36:12,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693620.0, ans=0.1 2024-08-10 18:36:19,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=693620.0, ans=0.125 2024-08-10 18:36:24,902 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11400, loss[loss=0.1121, beats_loss=0.01323, ecapa_loss=0.000256, whisper_loss=0.09629, over 21690.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01169, ecapa_loss=0.0002322, whisper_loss=0.09549, over 3838653.88 frames. ], batch size: 92, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:36:58,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2024-08-10 18:36:59,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693920.0, ans=0.125 2024-08-10 18:37:00,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=693920.0, ans=0.125 2024-08-10 18:37:07,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.890e+01 3.279e+01 3.857e+01 6.641e+01, threshold=6.557e+01, percent-clipped=0.0 2024-08-10 18:37:08,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694020.0, ans=0.1 2024-08-10 18:37:09,969 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 18:37:22,715 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 18:37:29,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=694120.0, ans=0.2 2024-08-10 18:37:39,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11450, loss[loss=0.09569, beats_loss=0.01261, ecapa_loss=0.0001896, whisper_loss=0.08118, over 14183.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01176, ecapa_loss=0.0002328, whisper_loss=0.09486, over 3840311.40 frames. ], batch size: 55, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:38:01,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=694320.0, ans=0.125 2024-08-10 18:38:06,647 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 18:38:08,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-10 18:38:17,656 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 18:38:17,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=694420.0, ans=0.125 2024-08-10 18:38:31,307 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 18:38:35,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2024-08-10 18:38:38,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=694520.0, ans=0.125 2024-08-10 18:38:42,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.13 vs. limit=10.0 2024-08-10 18:38:46,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=694620.0, ans=0.125 2024-08-10 18:38:52,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=694620.0, ans=0.125 2024-08-10 18:38:57,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11500, loss[loss=0.1159, beats_loss=0.00984, ecapa_loss=0.0002321, whisper_loss=0.1038, over 16841.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01175, ecapa_loss=0.0002327, whisper_loss=0.09506, over 3832565.76 frames. ], batch size: 63, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:39:05,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=694720.0, ans=0.125 2024-08-10 18:39:06,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694720.0, ans=0.1 2024-08-10 18:39:15,285 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 18:39:19,019 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 18:39:31,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=694920.0, ans=0.125 2024-08-10 18:39:39,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=694920.0, ans=0.125 2024-08-10 18:39:40,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.741e+01 3.082e+01 3.618e+01 5.964e+01, threshold=6.164e+01, percent-clipped=0.0 2024-08-10 18:39:45,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=695020.0, ans=0.0 2024-08-10 18:40:14,772 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11550, loss[loss=0.1155, beats_loss=0.009382, ecapa_loss=0.0002257, whisper_loss=0.1039, over 16484.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01175, ecapa_loss=0.0002328, whisper_loss=0.09531, over 3823527.98 frames. ], batch size: 66, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:40:17,105 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 18:40:35,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.83 vs. limit=15.0 2024-08-10 18:40:49,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=695420.0, ans=0.1 2024-08-10 18:40:57,654 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 18:40:58,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=695420.0, ans=0.0 2024-08-10 18:41:12,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=695520.0, ans=0.125 2024-08-10 18:41:15,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=695520.0, ans=0.0 2024-08-10 18:41:19,156 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 18:41:33,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11600, loss[loss=0.1193, beats_loss=0.007346, ecapa_loss=0.0002163, whisper_loss=0.1098, over 15227.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01168, ecapa_loss=0.0002331, whisper_loss=0.0956, over 3837936.82 frames. ], batch size: 54, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:41:36,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=695720.0, ans=0.09899494936611666 2024-08-10 18:42:05,511 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 18:42:16,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.928e+01 3.314e+01 3.952e+01 8.355e+01, threshold=6.627e+01, percent-clipped=1.0 2024-08-10 18:42:27,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=696020.0, ans=0.04949747468305833 2024-08-10 18:42:31,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=696020.0, ans=0.125 2024-08-10 18:42:49,814 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11650, loss[loss=0.1011, beats_loss=0.01139, ecapa_loss=0.0002498, whisper_loss=0.0872, over 17232.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01171, ecapa_loss=0.000232, whisper_loss=0.09599, over 3850605.37 frames. ], batch size: 69, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:42:50,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=696220.0, ans=0.125 2024-08-10 18:43:05,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-08-10 18:43:08,324 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 18:43:38,035 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 18:43:59,706 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11700, loss[loss=0.1179, beats_loss=0.01045, ecapa_loss=0.0002462, whisper_loss=0.105, over 13902.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01181, ecapa_loss=0.0002328, whisper_loss=0.09558, over 3884312.77 frames. ], batch size: 54, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:44:05,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=696720.0, ans=0.0 2024-08-10 18:44:07,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2024-08-10 18:44:30,800 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 18:44:32,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=696920.0, ans=0.0 2024-08-10 18:44:38,075 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 15 from LS+wenet, 29 from Vox, 47 fro AS 2024-08-10 18:44:39,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.923e+01 3.356e+01 3.959e+01 5.415e+01, threshold=6.712e+01, percent-clipped=0.0 2024-08-10 18:44:42,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=697020.0, ans=0.0 2024-08-10 18:44:44,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-10 18:44:45,700 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 18:44:47,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-10 18:44:49,220 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 18:44:52,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=697020.0, ans=0.0 2024-08-10 18:44:53,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-10 18:45:10,063 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11750, loss[loss=0.1144, beats_loss=0.01021, ecapa_loss=0.0002342, whisper_loss=0.1018, over 22032.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01182, ecapa_loss=0.0002317, whisper_loss=0.09599, over 3910022.22 frames. ], batch size: 88, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:45:10,545 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 18:45:16,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=697220.0, ans=0.0 2024-08-10 18:45:32,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=697320.0, ans=0.025 2024-08-10 18:45:44,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=697420.0, ans=0.1 2024-08-10 18:45:47,031 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 18:45:47,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=697420.0, ans=0.0 2024-08-10 18:45:54,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=697520.0, ans=0.0 2024-08-10 18:45:57,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=697520.0, ans=0.05 2024-08-10 18:46:06,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=697620.0, ans=0.125 2024-08-10 18:46:09,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=697620.0, ans=0.035 2024-08-10 18:46:15,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=697620.0, ans=0.125 2024-08-10 18:46:18,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-10 18:46:19,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11800, loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.000259, whisper_loss=0.0907, over 14897.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01186, ecapa_loss=0.000231, whisper_loss=0.09593, over 3891093.74 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:46:49,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=697920.0, ans=0.0 2024-08-10 18:46:58,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 3.039e+01 3.457e+01 3.903e+01 6.365e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 18:47:08,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=698020.0, ans=0.0 2024-08-10 18:47:09,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=698020.0, ans=0.125 2024-08-10 18:47:13,613 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 18:47:21,143 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-10 18:47:24,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=698120.0, ans=0.0 2024-08-10 18:47:30,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11850, loss[loss=0.1222, beats_loss=0.01237, ecapa_loss=0.0001696, whisper_loss=0.1081, over 23093.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01188, ecapa_loss=0.0002301, whisper_loss=0.09483, over 3866031.61 frames. ], batch size: 87, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:47:34,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=698220.0, ans=0.025 2024-08-10 18:47:42,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=698220.0, ans=0.025 2024-08-10 18:47:49,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=698320.0, ans=0.125 2024-08-10 18:48:11,267 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-10 18:48:11,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=698520.0, ans=0.09899494936611666 2024-08-10 18:48:12,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698520.0, ans=0.1 2024-08-10 18:48:15,096 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 18:48:29,942 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 18:48:35,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=698620.0, ans=0.0 2024-08-10 18:48:38,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=698720.0, ans=0.09899494936611666 2024-08-10 18:48:39,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11900, loss[loss=0.08327, beats_loss=0.01315, ecapa_loss=0.0002342, whisper_loss=0.06778, over 14911.00 frames. ], tot_loss[loss=0.109, beats_loss=0.012, ecapa_loss=0.0002275, whisper_loss=0.0947, over 3894941.96 frames. ], batch size: 61, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:48:41,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-08-10 18:48:45,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=698720.0, ans=0.2 2024-08-10 18:48:46,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698720.0, ans=0.1 2024-08-10 18:48:49,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-10 18:48:50,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=698720.0, ans=0.125 2024-08-10 18:48:57,033 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 18:48:59,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698820.0, ans=0.1 2024-08-10 18:49:01,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=698820.0, ans=0.2 2024-08-10 18:49:07,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=22.5 2024-08-10 18:49:08,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=698920.0, ans=0.0 2024-08-10 18:49:17,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.855e+01 3.159e+01 3.498e+01 6.204e+01, threshold=6.318e+01, percent-clipped=0.0 2024-08-10 18:49:20,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=699020.0, ans=0.0 2024-08-10 18:49:34,673 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 18:49:43,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=699120.0, ans=0.2 2024-08-10 18:49:46,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 11950, loss[loss=0.09139, beats_loss=0.01512, ecapa_loss=0.0001896, whisper_loss=0.07437, over 18300.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01185, ecapa_loss=0.0002307, whisper_loss=0.09549, over 3890420.11 frames. ], batch size: 75, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:49:47,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.66 vs. limit=22.5 2024-08-10 18:50:00,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=699320.0, ans=0.0 2024-08-10 18:50:07,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=699320.0, ans=15.0 2024-08-10 18:50:13,437 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 18:50:29,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2024-08-10 18:50:40,608 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 18:50:41,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=699620.0, ans=0.0 2024-08-10 18:50:47,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-10 18:50:48,417 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 18:50:53,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12000, loss[loss=0.09959, beats_loss=0.01356, ecapa_loss=0.0002403, whisper_loss=0.08363, over 18380.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01182, ecapa_loss=0.0002309, whisper_loss=0.09511, over 3870553.09 frames. ], batch size: 78, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:50:53,564 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 18:51:35,546 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007279, whisper_loss=0.255, over 922467.00 frames. 2024-08-10 18:51:54,180 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on SV_voxceleb1: loss=0.006203, beats_loss=0, ecapa_loss=0.0006203, whisper_loss=0, over 939242.00 frames. 2024-08-10 18:53:47,283 INFO [train_multi_KD3.py:1149] (3/4) Epoch 5, validation on AT_audioset: loss=0.02662, beats_loss=0.02662, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 18:53:47,287 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 18:53:52,607 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-10 18:53:52,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=699720.0, ans=0.125 2024-08-10 18:53:55,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=699720.0, ans=0.125 2024-08-10 18:54:03,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=699820.0, ans=0.125 2024-08-10 18:54:24,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-10 18:54:25,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.732e+01 3.134e+01 3.531e+01 7.163e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 18:54:44,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=700120.0, ans=0.125 2024-08-10 18:54:48,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=700120.0, ans=0.125 2024-08-10 18:54:54,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12050, loss[loss=0.1012, beats_loss=0.01144, ecapa_loss=0.0002225, whisper_loss=0.08754, over 14526.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01176, ecapa_loss=0.0002316, whisper_loss=0.0949, over 3868316.84 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:54:59,114 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 18:54:59,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=700220.0, ans=0.07 2024-08-10 18:55:00,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=700220.0, ans=0.2 2024-08-10 18:55:23,204 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 18:55:28,507 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 18:55:28,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=700420.0, ans=0.125 2024-08-10 18:55:28,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=700420.0, ans=0.125 2024-08-10 18:55:36,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=700520.0, ans=0.125 2024-08-10 18:55:50,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=700620.0, ans=0.125 2024-08-10 18:56:02,370 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12100, loss[loss=0.08035, beats_loss=0.01566, ecapa_loss=0.0001994, whisper_loss=0.06269, over 14468.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01175, ecapa_loss=0.0002332, whisper_loss=0.09417, over 3827390.92 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:56:04,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=700720.0, ans=0.1 2024-08-10 18:56:04,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2024-08-10 18:56:18,545 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 18:56:20,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=700820.0, ans=0.0 2024-08-10 18:56:28,030 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 13 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 18:56:30,460 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 18:56:40,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.774e+01 3.188e+01 3.789e+01 5.825e+01, threshold=6.376e+01, percent-clipped=0.0 2024-08-10 18:57:09,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12150, loss[loss=0.1048, beats_loss=0.0133, ecapa_loss=0.0002169, whisper_loss=0.08933, over 17688.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01185, ecapa_loss=0.0002324, whisper_loss=0.094, over 3827372.97 frames. ], batch size: 71, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:57:18,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=701220.0, ans=0.0 2024-08-10 18:57:25,762 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:57:25,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-08-10 18:57:44,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=701420.0, ans=15.0 2024-08-10 18:58:03,319 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 18:58:06,164 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 13 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 18:58:15,488 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 18:58:17,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12200, loss[loss=0.1122, beats_loss=0.01221, ecapa_loss=0.0001893, whisper_loss=0.09812, over 16223.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01188, ecapa_loss=0.0002316, whisper_loss=0.09423, over 3838301.33 frames. ], batch size: 62, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:58:37,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2024-08-10 18:58:38,460 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 18:58:41,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2024-08-10 18:58:55,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.903e+01 3.177e+01 3.659e+01 7.236e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 18:58:58,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=702020.0, ans=0.125 2024-08-10 18:59:11,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=702120.0, ans=0.0 2024-08-10 18:59:11,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2024-08-10 18:59:24,067 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-10 18:59:24,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=702220.0, ans=0.125 2024-08-10 18:59:25,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12250, loss[loss=0.1165, beats_loss=0.009527, ecapa_loss=0.0002978, whisper_loss=0.1039, over 18757.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01179, ecapa_loss=0.0002322, whisper_loss=0.09416, over 3857264.32 frames. ], batch size: 74, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:59:42,611 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 18:59:59,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=702420.0, ans=0.0 2024-08-10 18:59:59,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=702420.0, ans=0.2 2024-08-10 19:00:11,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=702520.0, ans=0.125 2024-08-10 19:00:31,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=702720.0, ans=0.125 2024-08-10 19:00:32,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12300, loss[loss=0.1376, beats_loss=0.009867, ecapa_loss=0.0002325, whisper_loss=0.1254, over 23100.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01177, ecapa_loss=0.0002327, whisper_loss=0.09404, over 3840688.67 frames. ], batch size: 88, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:00:39,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=702720.0, ans=0.1 2024-08-10 19:00:47,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=702820.0, ans=0.0 2024-08-10 19:00:48,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=702820.0, ans=0.0 2024-08-10 19:00:50,253 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 19:00:54,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=702820.0, ans=0.125 2024-08-10 19:01:06,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=702920.0, ans=0.015 2024-08-10 19:01:10,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.851e+01 3.322e+01 3.771e+01 6.110e+01, threshold=6.644e+01, percent-clipped=0.0 2024-08-10 19:01:22,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.57 vs. limit=15.0 2024-08-10 19:01:39,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12350, loss[loss=0.09181, beats_loss=0.01339, ecapa_loss=0.000268, whisper_loss=0.07573, over 13243.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01175, ecapa_loss=0.0002329, whisper_loss=0.09421, over 3829035.23 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:02:23,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2024-08-10 19:02:35,505 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 19:02:41,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=703620.0, ans=0.125 2024-08-10 19:02:50,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-08-10 19:02:52,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12400, loss[loss=0.1167, beats_loss=0.008795, ecapa_loss=0.0002423, whisper_loss=0.1055, over 18893.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01179, ecapa_loss=0.0002306, whisper_loss=0.09454, over 3837981.46 frames. ], batch size: 75, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:02:53,526 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:03:03,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=703720.0, ans=0.0 2024-08-10 19:03:11,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=703820.0, ans=0.0 2024-08-10 19:03:13,287 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 19:03:13,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=703820.0, ans=0.125 2024-08-10 19:03:18,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-10 19:03:31,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.592e+01 3.077e+01 3.649e+01 6.276e+01, threshold=6.154e+01, percent-clipped=0.0 2024-08-10 19:03:32,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-08-10 19:03:44,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=704020.0, ans=0.0 2024-08-10 19:04:00,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=704120.0, ans=0.0 2024-08-10 19:04:03,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12450, loss[loss=0.1182, beats_loss=0.009598, ecapa_loss=0.0002001, whisper_loss=0.1066, over 16228.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01175, ecapa_loss=0.0002306, whisper_loss=0.09496, over 3848081.70 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:04:11,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=704220.0, ans=0.125 2024-08-10 19:04:14,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=704220.0, ans=0.125 2024-08-10 19:04:14,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=704220.0, ans=0.125 2024-08-10 19:04:25,834 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 19:05:03,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2024-08-10 19:05:05,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704620.0, ans=0.1 2024-08-10 19:05:07,263 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 19:05:12,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12500, loss[loss=0.1125, beats_loss=0.009755, ecapa_loss=0.0002578, whisper_loss=0.1002, over 20763.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0118, ecapa_loss=0.0002299, whisper_loss=0.09457, over 3870512.76 frames. ], batch size: 84, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:05:17,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=704720.0, ans=0.0 2024-08-10 19:05:17,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=704720.0, ans=0.1 2024-08-10 19:05:31,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=704820.0, ans=0.0 2024-08-10 19:05:46,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=704920.0, ans=0.0 2024-08-10 19:05:48,939 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 19:05:51,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.864e+01 3.204e+01 3.870e+01 6.784e+01, threshold=6.407e+01, percent-clipped=3.0 2024-08-10 19:06:06,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=12.0 2024-08-10 19:06:07,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=705120.0, ans=0.2 2024-08-10 19:06:16,675 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 19:06:21,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12550, loss[loss=0.1198, beats_loss=0.01195, ecapa_loss=0.0002159, whisper_loss=0.1057, over 23068.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01181, ecapa_loss=0.00023, whisper_loss=0.09524, over 3910274.58 frames. ], batch size: 89, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:06:22,521 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 19:06:26,524 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 19:06:28,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=705220.0, ans=0.125 2024-08-10 19:06:41,178 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 19:06:43,966 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 19:06:49,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=705420.0, ans=0.125 2024-08-10 19:06:52,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=705420.0, ans=0.2 2024-08-10 19:06:53,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=705420.0, ans=0.125 2024-08-10 19:06:58,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=705420.0, ans=0.025 2024-08-10 19:07:04,771 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 19:07:23,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-10 19:07:34,881 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12600, loss[loss=0.1083, beats_loss=0.01101, ecapa_loss=0.0002184, whisper_loss=0.09514, over 17034.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01186, ecapa_loss=0.0002312, whisper_loss=0.09494, over 3881965.56 frames. ], batch size: 66, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:07:48,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=705820.0, ans=0.0 2024-08-10 19:07:49,433 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-10 19:07:54,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=705820.0, ans=0.125 2024-08-10 19:08:02,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=705920.0, ans=0.0 2024-08-10 19:08:08,586 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 19:08:10,163 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 19:08:12,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.787e+01 3.074e+01 3.484e+01 6.689e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-10 19:08:22,227 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 19:08:22,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=706020.0, ans=0.125 2024-08-10 19:08:25,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706020.0, ans=0.1 2024-08-10 19:08:26,427 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 19:08:31,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.53 vs. limit=15.0 2024-08-10 19:08:42,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12650, loss[loss=0.1144, beats_loss=0.01107, ecapa_loss=0.000216, whisper_loss=0.1011, over 23532.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0119, ecapa_loss=0.0002294, whisper_loss=0.09454, over 3898702.46 frames. ], batch size: 91, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:08:42,184 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 19:08:45,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=706220.0, ans=0.07 2024-08-10 19:08:46,463 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 19:09:00,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=706320.0, ans=10.0 2024-08-10 19:09:06,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2024-08-10 19:09:10,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=706420.0, ans=0.125 2024-08-10 19:09:23,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=706520.0, ans=0.0 2024-08-10 19:09:30,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=706520.0, ans=0.125 2024-08-10 19:09:32,023 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 19:09:49,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12700, loss[loss=0.1185, beats_loss=0.009326, ecapa_loss=0.0002514, whisper_loss=0.1067, over 16785.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01188, ecapa_loss=0.0002305, whisper_loss=0.095, over 3879909.85 frames. ], batch size: 65, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:09:49,470 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 19:09:49,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=706720.0, ans=0.125 2024-08-10 19:10:11,051 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 19:10:17,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=706920.0, ans=0.125 2024-08-10 19:10:18,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=706920.0, ans=0.07 2024-08-10 19:10:27,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.805e+01 3.118e+01 3.753e+01 7.808e+01, threshold=6.236e+01, percent-clipped=1.0 2024-08-10 19:10:34,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=707020.0, ans=0.125 2024-08-10 19:10:34,659 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.533e-03 2024-08-10 19:10:41,187 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 19:10:57,196 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12750, loss[loss=0.09093, beats_loss=0.01193, ecapa_loss=0.0002853, whisper_loss=0.07615, over 15148.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01192, ecapa_loss=0.0002304, whisper_loss=0.09493, over 3880351.37 frames. ], batch size: 65, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:11:07,384 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.078e-01 2024-08-10 19:11:11,095 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-10 19:11:28,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2024-08-10 19:11:35,859 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 19:11:37,236 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 19:11:59,299 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 19:12:03,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=707720.0, ans=0.0 2024-08-10 19:12:04,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12800, loss[loss=0.127, beats_loss=0.007722, ecapa_loss=0.0002894, whisper_loss=0.1164, over 18834.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01189, ecapa_loss=0.0002319, whisper_loss=0.09528, over 3896474.93 frames. ], batch size: 76, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:12:10,729 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.596e-02 2024-08-10 19:12:17,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=707820.0, ans=0.125 2024-08-10 19:12:22,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=707820.0, ans=0.2 2024-08-10 19:12:35,278 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 19:12:41,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.755e+01 3.116e+01 3.558e+01 5.514e+01, threshold=6.233e+01, percent-clipped=0.0 2024-08-10 19:12:43,631 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-10 19:12:44,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=708020.0, ans=0.0 2024-08-10 19:12:55,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-08-10 19:13:11,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12850, loss[loss=0.0598, beats_loss=0.0149, ecapa_loss=0.0001767, whisper_loss=0.04314, over 15417.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01188, ecapa_loss=0.000231, whisper_loss=0.095, over 3856309.09 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:13:15,515 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 19:13:19,648 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 19:13:24,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=708320.0, ans=0.125 2024-08-10 19:13:24,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=708320.0, ans=0.1 2024-08-10 19:13:34,250 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.728e-01 2024-08-10 19:13:47,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=708420.0, ans=10.0 2024-08-10 19:13:52,662 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 19:14:16,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=708620.0, ans=0.09899494936611666 2024-08-10 19:14:18,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12900, loss[loss=0.1068, beats_loss=0.01103, ecapa_loss=0.0002146, whisper_loss=0.09359, over 15339.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01195, ecapa_loss=0.000231, whisper_loss=0.09454, over 3870139.74 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:14:26,984 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 9 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 19:14:27,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-10 19:14:31,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=708820.0, ans=0.0 2024-08-10 19:14:35,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-10 19:14:44,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=708920.0, ans=0.125 2024-08-10 19:14:55,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.870e+01 3.277e+01 3.550e+01 6.009e+01, threshold=6.554e+01, percent-clipped=0.0 2024-08-10 19:14:59,158 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 19:15:03,372 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 19:15:09,898 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 19:15:13,872 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 19:15:15,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-08-10 19:15:24,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 12950, loss[loss=0.1103, beats_loss=0.01112, ecapa_loss=0.0003027, whisper_loss=0.09616, over 20619.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01187, ecapa_loss=0.0002321, whisper_loss=0.09453, over 3863985.70 frames. ], batch size: 88, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:15:26,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=709220.0, ans=0.1 2024-08-10 19:15:27,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=709220.0, ans=0.125 2024-08-10 19:15:40,505 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.715e-01 2024-08-10 19:15:46,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=709320.0, ans=0.125 2024-08-10 19:15:58,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=709420.0, ans=0.125 2024-08-10 19:16:12,485 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 19:16:13,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=709520.0, ans=0.04949747468305833 2024-08-10 19:16:23,534 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 19:16:25,186 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.290e-01 2024-08-10 19:16:30,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13000, loss[loss=0.08959, beats_loss=0.01576, ecapa_loss=0.0001956, whisper_loss=0.07187, over 19942.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01183, ecapa_loss=0.0002314, whisper_loss=0.095, over 3899017.26 frames. ], batch size: 82, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:16:30,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=709720.0, ans=0.035 2024-08-10 19:16:38,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0 2024-08-10 19:16:45,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2024-08-10 19:17:01,139 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 19:17:06,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.942e+01 3.329e+01 3.753e+01 5.609e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 19:17:07,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2024-08-10 19:17:10,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=710020.0, ans=0.125 2024-08-10 19:17:11,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=710020.0, ans=0.95 2024-08-10 19:17:11,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710020.0, ans=0.1 2024-08-10 19:17:20,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=12.0 2024-08-10 19:17:28,954 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:17:35,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13050, loss[loss=0.09874, beats_loss=0.01441, ecapa_loss=0.0002225, whisper_loss=0.08211, over 20884.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01183, ecapa_loss=0.0002313, whisper_loss=0.09489, over 3891644.01 frames. ], batch size: 86, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:17:42,636 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 19:17:45,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=12.0 2024-08-10 19:18:10,643 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 19:18:25,102 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 19:18:35,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=710620.0, ans=0.0 2024-08-10 19:18:42,109 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13100, loss[loss=0.08407, beats_loss=0.01514, ecapa_loss=0.0001959, whisper_loss=0.06697, over 17707.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0119, ecapa_loss=0.0002309, whisper_loss=0.09429, over 3894133.21 frames. ], batch size: 74, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:19:01,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=12.0 2024-08-10 19:19:05,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=710820.0, ans=0.125 2024-08-10 19:19:18,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=710920.0, ans=0.0 2024-08-10 19:19:19,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.880e+01 3.300e+01 3.880e+01 5.965e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-10 19:19:23,901 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 19:19:24,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=711020.0, ans=0.125 2024-08-10 19:19:29,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=711020.0, ans=0.5 2024-08-10 19:19:39,530 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 40 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 19:19:42,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=711120.0, ans=0.125 2024-08-10 19:19:45,021 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 19:19:46,363 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 19:19:48,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13150, loss[loss=0.09497, beats_loss=0.01307, ecapa_loss=0.0002161, whisper_loss=0.07973, over 17274.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01188, ecapa_loss=0.0002321, whisper_loss=0.095, over 3900286.32 frames. ], batch size: 69, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:19:54,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=711220.0, ans=0.125 2024-08-10 19:20:11,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=711320.0, ans=0.125 2024-08-10 19:20:16,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-10 19:20:20,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=711420.0, ans=0.125 2024-08-10 19:20:22,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=711420.0, ans=0.2 2024-08-10 19:20:28,688 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 19:20:35,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=711520.0, ans=0.0 2024-08-10 19:20:38,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=711520.0, ans=0.0 2024-08-10 19:20:55,163 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 19:20:59,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13200, loss[loss=0.112, beats_loss=0.009925, ecapa_loss=0.0002705, whisper_loss=0.09939, over 16394.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01179, ecapa_loss=0.0002316, whisper_loss=0.09581, over 3881636.70 frames. ], batch size: 65, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:20:59,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=711720.0, ans=0.035 2024-08-10 19:21:24,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.37 vs. limit=10.0 2024-08-10 19:21:37,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.006e+01 3.463e+01 3.966e+01 7.207e+01, threshold=6.927e+01, percent-clipped=1.0 2024-08-10 19:21:39,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=712020.0, ans=0.125 2024-08-10 19:21:54,372 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 19:21:54,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=712120.0, ans=0.125 2024-08-10 19:22:02,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=712120.0, ans=0.0 2024-08-10 19:22:03,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=712120.0, ans=0.2 2024-08-10 19:22:05,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13250, loss[loss=0.1266, beats_loss=0.01104, ecapa_loss=0.0002221, whisper_loss=0.1133, over 17033.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01176, ecapa_loss=0.0002316, whisper_loss=0.0957, over 3887901.49 frames. ], batch size: 64, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:22:06,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=712220.0, ans=0.125 2024-08-10 19:22:13,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=712220.0, ans=0.125 2024-08-10 19:22:33,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=712420.0, ans=0.125 2024-08-10 19:22:35,574 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 19:22:37,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=8.0 2024-08-10 19:22:38,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=712420.0, ans=0.2 2024-08-10 19:22:38,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=712420.0, ans=0.125 2024-08-10 19:22:45,820 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 19:22:48,388 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 19:22:54,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=712520.0, ans=0.125 2024-08-10 19:23:11,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13300, loss[loss=0.1069, beats_loss=0.01068, ecapa_loss=0.0002451, whisper_loss=0.09374, over 15967.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01166, ecapa_loss=0.0002326, whisper_loss=0.09596, over 3888212.57 frames. ], batch size: 62, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:23:24,044 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 19:23:26,572 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 19:23:29,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=712820.0, ans=0.125 2024-08-10 19:23:29,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-10 19:23:31,660 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 19:23:39,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=712920.0, ans=0.125 2024-08-10 19:23:42,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=712920.0, ans=0.0 2024-08-10 19:23:46,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2024-08-10 19:23:50,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=712920.0, ans=15.0 2024-08-10 19:23:50,928 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.791e+01 3.143e+01 3.422e+01 5.648e+01, threshold=6.287e+01, percent-clipped=0.0 2024-08-10 19:24:03,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=713020.0, ans=0.125 2024-08-10 19:24:16,411 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 19:24:19,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=713220.0, ans=0.125 2024-08-10 19:24:19,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13350, loss[loss=0.1277, beats_loss=0.01253, ecapa_loss=0.0002494, whisper_loss=0.1127, over 23393.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01171, ecapa_loss=0.000232, whisper_loss=0.09544, over 3885587.13 frames. ], batch size: 94, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:24:20,025 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 9 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 19:24:26,974 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 19:24:32,230 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 19:24:34,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=713320.0, ans=0.05 2024-08-10 19:24:38,601 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 19:24:45,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-10 19:24:58,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=713420.0, ans=0.0 2024-08-10 19:25:04,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-10 19:25:06,345 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 33 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 19:25:24,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=713620.0, ans=0.05 2024-08-10 19:25:26,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=713720.0, ans=0.125 2024-08-10 19:25:26,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13400, loss[loss=0.09486, beats_loss=0.01398, ecapa_loss=0.0002146, whisper_loss=0.07873, over 19296.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01172, ecapa_loss=0.0002308, whisper_loss=0.09566, over 3872638.25 frames. ], batch size: 80, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:25:28,542 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 19:25:36,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-10 19:26:05,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.716e+01 3.114e+01 3.677e+01 5.856e+01, threshold=6.229e+01, percent-clipped=0.0 2024-08-10 19:26:08,232 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 19:26:11,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=12.0 2024-08-10 19:26:12,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=714020.0, ans=0.125 2024-08-10 19:26:21,909 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 19:26:31,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=714120.0, ans=0.05 2024-08-10 19:26:36,576 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:26:38,531 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13450, loss[loss=0.1041, beats_loss=0.01186, ecapa_loss=0.0002564, whisper_loss=0.08967, over 21141.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01183, ecapa_loss=0.0002311, whisper_loss=0.09437, over 3862457.59 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:26:41,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=714220.0, ans=0.125 2024-08-10 19:26:46,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.87 vs. limit=10.0 2024-08-10 19:27:00,535 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 19:27:05,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=714320.0, ans=0.0 2024-08-10 19:27:21,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=714420.0, ans=0.125 2024-08-10 19:27:41,775 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 19:27:48,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=714520.0, ans=0.05 2024-08-10 19:27:53,847 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 19:27:56,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=714620.0, ans=0.2 2024-08-10 19:28:06,828 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:28:17,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=15.0 2024-08-10 19:28:18,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13500, loss[loss=0.1212, beats_loss=0.01042, ecapa_loss=0.0002644, whisper_loss=0.1082, over 22642.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01181, ecapa_loss=0.0002336, whisper_loss=0.09452, over 3847877.65 frames. ], batch size: 92, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:28:18,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=714720.0, ans=0.125 2024-08-10 19:28:24,594 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 19:28:49,024 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-10 19:29:09,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=714920.0, ans=0.125 2024-08-10 19:29:09,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2024-08-10 19:29:11,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.843e+01 3.302e+01 3.860e+01 1.367e+02, threshold=6.604e+01, percent-clipped=1.0 2024-08-10 19:29:11,425 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 19:29:33,429 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 19:29:43,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13550, loss[loss=0.1354, beats_loss=0.01103, ecapa_loss=0.0002317, whisper_loss=0.1221, over 24089.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01178, ecapa_loss=0.0002336, whisper_loss=0.09487, over 3850770.54 frames. ], batch size: 92, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:29:45,070 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 19:29:52,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=715220.0, ans=0.125 2024-08-10 19:30:03,379 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 19:30:19,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=715420.0, ans=0.2 2024-08-10 19:30:22,579 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 19:30:26,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=715520.0, ans=0.0 2024-08-10 19:30:34,928 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 19:30:56,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13600, loss[loss=0.1007, beats_loss=0.01082, ecapa_loss=0.0002429, whisper_loss=0.08747, over 21894.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01172, ecapa_loss=0.0002335, whisper_loss=0.09534, over 3883388.12 frames. ], batch size: 90, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:31:06,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2024-08-10 19:31:12,801 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 19:31:17,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-10 19:31:20,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=715820.0, ans=0.125 2024-08-10 19:31:25,359 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 19:31:40,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 3.018e+01 3.345e+01 4.176e+01 9.829e+01, threshold=6.690e+01, percent-clipped=2.0 2024-08-10 19:31:50,327 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 19:31:51,833 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 19:32:03,261 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-10 19:32:12,169 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 19:32:13,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13650, loss[loss=0.09716, beats_loss=0.01339, ecapa_loss=0.0002357, whisper_loss=0.08142, over 16857.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01174, ecapa_loss=0.0002348, whisper_loss=0.09522, over 3850820.63 frames. ], batch size: 66, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:32:19,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=716220.0, ans=0.2 2024-08-10 19:32:38,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=716320.0, ans=0.125 2024-08-10 19:32:42,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=716420.0, ans=0.95 2024-08-10 19:33:19,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=12.0 2024-08-10 19:33:23,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=716620.0, ans=0.125 2024-08-10 19:33:28,297 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 19:33:29,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13700, loss[loss=0.1172, beats_loss=0.01091, ecapa_loss=0.0002196, whisper_loss=0.1041, over 23715.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01172, ecapa_loss=0.0002341, whisper_loss=0.09511, over 3839074.55 frames. ], batch size: 93, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:33:30,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=716720.0, ans=0.125 2024-08-10 19:34:12,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.823e+01 3.317e+01 3.890e+01 6.067e+01, threshold=6.634e+01, percent-clipped=0.0 2024-08-10 19:34:14,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=717020.0, ans=0.2 2024-08-10 19:34:17,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-10 19:34:26,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=717020.0, ans=0.125 2024-08-10 19:34:26,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=717020.0, ans=0.0 2024-08-10 19:34:46,783 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13750, loss[loss=0.1124, beats_loss=0.01158, ecapa_loss=0.0002039, whisper_loss=0.09882, over 21996.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01178, ecapa_loss=0.0002338, whisper_loss=0.09495, over 3864355.52 frames. ], batch size: 87, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:35:04,984 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 19:35:09,465 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 19:35:12,547 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 19:35:15,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-10 19:35:41,254 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 19:35:43,090 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 19:35:44,526 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 19:35:44,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=717520.0, ans=0.125 2024-08-10 19:35:48,545 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 19:36:02,436 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13800, loss[loss=0.09945, beats_loss=0.01391, ecapa_loss=0.0002662, whisper_loss=0.08288, over 17765.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01183, ecapa_loss=0.0002328, whisper_loss=0.09484, over 3882538.37 frames. ], batch size: 75, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:36:16,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=717820.0, ans=0.125 2024-08-10 19:36:38,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-10 19:36:45,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=717920.0, ans=0.125 2024-08-10 19:36:46,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+01 2.754e+01 3.224e+01 3.629e+01 6.153e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-10 19:37:21,055 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13850, loss[loss=0.1057, beats_loss=0.01106, ecapa_loss=0.0002303, whisper_loss=0.09238, over 17305.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01173, ecapa_loss=0.0002337, whisper_loss=0.0954, over 3899516.48 frames. ], batch size: 68, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:37:31,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2024-08-10 19:37:33,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=718220.0, ans=0.05 2024-08-10 19:37:41,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=718320.0, ans=0.0 2024-08-10 19:37:58,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=718420.0, ans=0.125 2024-08-10 19:38:00,354 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.375e-01 2024-08-10 19:38:33,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2024-08-10 19:38:41,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13900, loss[loss=0.1111, beats_loss=0.01207, ecapa_loss=0.0002779, whisper_loss=0.09623, over 14469.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01171, ecapa_loss=0.0002327, whisper_loss=0.09616, over 3881339.56 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:39:19,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=718920.0, ans=0.0 2024-08-10 19:39:24,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.959e+01 3.276e+01 3.717e+01 7.288e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 19:39:42,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=719120.0, ans=0.125 2024-08-10 19:39:49,774 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 16 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 19:39:57,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 13950, loss[loss=0.1011, beats_loss=0.01211, ecapa_loss=0.0002512, whisper_loss=0.08651, over 17092.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01174, ecapa_loss=0.000232, whisper_loss=0.09601, over 3867338.18 frames. ], batch size: 71, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:40:10,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=719220.0, ans=0.125 2024-08-10 19:40:15,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=719320.0, ans=0.125 2024-08-10 19:40:28,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=719420.0, ans=0.0 2024-08-10 19:40:43,021 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 19:40:45,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-10 19:40:46,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=719520.0, ans=0.125 2024-08-10 19:40:55,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=719520.0, ans=0.07 2024-08-10 19:41:13,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14000, loss[loss=0.1072, beats_loss=0.01197, ecapa_loss=0.0002255, whisper_loss=0.09295, over 21833.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01173, ecapa_loss=0.0002308, whisper_loss=0.09582, over 3868180.99 frames. ], batch size: 88, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:41:18,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=719720.0, ans=0.125 2024-08-10 19:41:24,205 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 19:41:24,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=719720.0, ans=0.025 2024-08-10 19:41:53,222 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-10 19:41:53,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=719920.0, ans=0.125 2024-08-10 19:41:59,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.816e+01 3.391e+01 3.815e+01 6.287e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 19:42:01,213 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 19:42:08,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-08-10 19:42:14,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-10 19:42:17,001 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 19:42:23,672 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 19:42:34,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14050, loss[loss=0.1063, beats_loss=0.01375, ecapa_loss=0.0002474, whisper_loss=0.09012, over 17260.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01175, ecapa_loss=0.0002302, whisper_loss=0.09581, over 3858693.74 frames. ], batch size: 69, lr: 1.18e-02, grad_scale: 2199023255552.0 2024-08-10 19:42:41,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=15.0 2024-08-10 19:42:53,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=720320.0, ans=0.125 2024-08-10 19:43:03,104 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:43:18,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=720420.0, ans=0.125 2024-08-10 19:43:22,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.21 vs. limit=10.0 2024-08-10 19:43:51,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14100, loss[loss=0.1058, beats_loss=0.01332, ecapa_loss=0.0002302, whisper_loss=0.09015, over 22294.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.0118, ecapa_loss=0.0002303, whisper_loss=0.09511, over 3875828.09 frames. ], batch size: 90, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:44:10,928 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-10 19:44:20,077 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 19:44:27,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=720920.0, ans=0.0 2024-08-10 19:44:32,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.752e+01 3.141e+01 3.762e+01 7.016e+01, threshold=6.282e+01, percent-clipped=2.0 2024-08-10 19:44:47,560 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-10 19:44:47,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=721020.0, ans=0.125 2024-08-10 19:44:49,622 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:44:49,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=721020.0, ans=0.0 2024-08-10 19:44:53,696 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 19:44:55,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=721120.0, ans=0.125 2024-08-10 19:45:01,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=721120.0, ans=0.07 2024-08-10 19:45:06,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14150, loss[loss=0.1093, beats_loss=0.01134, ecapa_loss=0.0002233, whisper_loss=0.09576, over 20264.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01175, ecapa_loss=0.0002302, whisper_loss=0.09558, over 3864912.46 frames. ], batch size: 81, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:45:11,565 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 19:45:24,710 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 19:45:29,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=721320.0, ans=0.0 2024-08-10 19:45:29,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=721320.0, ans=0.125 2024-08-10 19:45:51,733 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 19:45:56,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=721520.0, ans=0.2 2024-08-10 19:45:57,784 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 19:46:05,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=721620.0, ans=0.2 2024-08-10 19:46:08,773 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 19:46:21,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14200, loss[loss=0.09272, beats_loss=0.01421, ecapa_loss=0.0002424, whisper_loss=0.07609, over 21603.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0118, ecapa_loss=0.0002296, whisper_loss=0.09503, over 3892612.19 frames. ], batch size: 93, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:46:25,567 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 19:46:27,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-10 19:46:37,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=721820.0, ans=0.0 2024-08-10 19:46:38,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=721820.0, ans=0.125 2024-08-10 19:46:41,808 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-10 19:46:43,373 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 19:46:48,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-10 19:47:04,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+01 2.830e+01 3.191e+01 3.752e+01 5.497e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-10 19:47:11,303 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 19:47:11,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=722020.0, ans=0.2 2024-08-10 19:47:28,511 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 19:47:29,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=722120.0, ans=0.0 2024-08-10 19:47:38,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14250, loss[loss=0.1256, beats_loss=0.01157, ecapa_loss=0.0002199, whisper_loss=0.1118, over 16928.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0118, ecapa_loss=0.000228, whisper_loss=0.09521, over 3904313.19 frames. ], batch size: 63, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:47:44,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=722220.0, ans=0.125 2024-08-10 19:47:45,677 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 19:47:54,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-10 19:48:15,912 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 19:48:24,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-10 19:48:28,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2024-08-10 19:48:29,621 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 19:48:30,977 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 19:48:43,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722620.0, ans=0.1 2024-08-10 19:48:56,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14300, loss[loss=0.07945, beats_loss=0.01547, ecapa_loss=0.0001994, whisper_loss=0.06199, over 14140.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01187, ecapa_loss=0.0002292, whisper_loss=0.09457, over 3889477.85 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:48:58,366 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 19:49:17,747 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-10 19:49:40,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=722920.0, ans=0.0 2024-08-10 19:49:40,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.839e+01 3.149e+01 3.823e+01 7.710e+01, threshold=6.298e+01, percent-clipped=1.0 2024-08-10 19:49:57,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=723020.0, ans=0.2 2024-08-10 19:50:03,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=723120.0, ans=0.125 2024-08-10 19:50:06,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=723120.0, ans=0.0 2024-08-10 19:50:15,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14350, loss[loss=0.1253, beats_loss=0.006647, ecapa_loss=0.0003438, whisper_loss=0.1152, over 15958.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01189, ecapa_loss=0.0002302, whisper_loss=0.09451, over 3891989.86 frames. ], batch size: 67, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:50:44,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=723420.0, ans=0.04949747468305833 2024-08-10 19:50:50,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2024-08-10 19:50:57,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=723420.0, ans=0.125 2024-08-10 19:50:58,593 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 19:51:11,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=723520.0, ans=0.1 2024-08-10 19:51:26,654 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 19:51:29,432 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 19:51:30,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14400, loss[loss=0.1183, beats_loss=0.01008, ecapa_loss=0.000259, whisper_loss=0.1056, over 19421.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01184, ecapa_loss=0.0002315, whisper_loss=0.0951, over 3900602.44 frames. ], batch size: 78, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:51:36,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=723720.0, ans=0.125 2024-08-10 19:51:39,401 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 19:51:53,776 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 19:52:11,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.768e+01 3.038e+01 3.446e+01 5.868e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-10 19:52:27,913 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:52:31,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=724120.0, ans=0.0 2024-08-10 19:52:47,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 5, batch 14450, loss[loss=0.09952, beats_loss=0.01116, ecapa_loss=0.0002515, whisper_loss=0.08585, over 18003.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01185, ecapa_loss=0.000233, whisper_loss=0.09552, over 3926250.34 frames. ], batch size: 71, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:53:09,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=724320.0, ans=0.125 2024-08-10 19:53:09,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-10 19:54:34,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 0, loss[loss=0.1149, beats_loss=0.01296, ecapa_loss=0.0001961, whisper_loss=0.09994, over 18916.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01296, ecapa_loss=0.0001961, whisper_loss=0.09994, over 18916.00 frames. ], batch size: 73, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:54:34,068 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 19:55:10,699 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on ASR_libri: loss=0.2614, beats_loss=0, ecapa_loss=0.0007237, whisper_loss=0.2541, over 922467.00 frames. 2024-08-10 19:55:26,809 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on SV_voxceleb1: loss=0.006205, beats_loss=0, ecapa_loss=0.0006205, whisper_loss=0, over 939242.00 frames. 2024-08-10 19:57:12,790 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on AT_audioset: loss=0.02628, beats_loss=0.02628, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 19:57:12,793 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 19:57:14,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=724650.0, ans=0.125 2024-08-10 19:57:41,366 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 35 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 19:57:41,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=724750.0, ans=0.2 2024-08-10 19:57:41,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724750.0, ans=0.0 2024-08-10 19:57:44,282 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 19:57:56,878 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 19:58:19,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724850.0, ans=0.1 2024-08-10 19:58:19,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2024-08-10 19:58:37,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=724950.0, ans=0.125 2024-08-10 19:58:40,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.034e+01 3.419e+01 4.003e+01 7.099e+01, threshold=6.838e+01, percent-clipped=1.0 2024-08-10 19:58:46,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=724950.0, ans=0.0 2024-08-10 19:59:02,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=725050.0, ans=0.2 2024-08-10 19:59:15,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 50, loss[loss=0.1106, beats_loss=0.0119, ecapa_loss=0.0002461, whisper_loss=0.09619, over 23192.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01111, ecapa_loss=0.0002469, whisper_loss=0.09549, over 925590.64 frames. ], batch size: 92, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:59:28,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=725150.0, ans=0.0 2024-08-10 19:59:29,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=725150.0, ans=0.125 2024-08-10 19:59:37,273 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 19:59:50,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=725250.0, ans=0.125 2024-08-10 20:00:21,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=725350.0, ans=0.0 2024-08-10 20:00:21,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=725350.0, ans=0.05 2024-08-10 20:00:25,600 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.954e+00 2024-08-10 20:00:25,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-08-10 20:00:41,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=725450.0, ans=0.125 2024-08-10 20:01:07,183 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 20:01:09,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 100, loss[loss=0.1042, beats_loss=0.01133, ecapa_loss=0.000225, whisper_loss=0.09062, over 23378.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01121, ecapa_loss=0.000241, whisper_loss=0.0947, over 1562619.72 frames. ], batch size: 94, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:01:22,354 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 20:02:03,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=725850.0, ans=0.95 2024-08-10 20:02:28,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.882e+01 3.222e+01 3.754e+01 5.300e+01, threshold=6.444e+01, percent-clipped=0.0 2024-08-10 20:02:41,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726050.0, ans=0.1 2024-08-10 20:02:42,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2024-08-10 20:02:58,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 150, loss[loss=0.08947, beats_loss=0.01012, ecapa_loss=0.0002181, whisper_loss=0.07716, over 15566.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01132, ecapa_loss=0.0002346, whisper_loss=0.09386, over 2054514.38 frames. ], batch size: 59, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:03:11,793 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 20:03:32,458 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 20:03:42,193 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 20:03:51,058 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 20:03:57,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=726450.0, ans=0.125 2024-08-10 20:03:58,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-10 20:04:09,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2024-08-10 20:04:13,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=726550.0, ans=0.1 2024-08-10 20:04:19,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=726550.0, ans=0.0 2024-08-10 20:04:20,351 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 20:04:22,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 200, loss[loss=0.1047, beats_loss=0.01189, ecapa_loss=0.0002154, whisper_loss=0.09061, over 16795.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01139, ecapa_loss=0.0002315, whisper_loss=0.09341, over 2424055.31 frames. ], batch size: 67, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:04:27,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=726650.0, ans=0.2 2024-08-10 20:04:31,352 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 28 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-10 20:04:53,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2024-08-10 20:05:02,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=726850.0, ans=0.125 2024-08-10 20:05:07,602 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.919e+00 2024-08-10 20:05:18,135 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 20:05:19,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.639e+01 2.951e+01 3.334e+01 6.571e+01, threshold=5.903e+01, percent-clipped=1.0 2024-08-10 20:05:26,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=727050.0, ans=0.0 2024-08-10 20:05:41,611 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 250, loss[loss=0.09963, beats_loss=0.01314, ecapa_loss=0.0001829, whisper_loss=0.08467, over 20149.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01148, ecapa_loss=0.0002286, whisper_loss=0.09396, over 2755673.29 frames. ], batch size: 79, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:05:55,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727250.0, ans=0.1 2024-08-10 20:06:09,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727350.0, ans=0.1 2024-08-10 20:06:12,158 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 20:06:21,290 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 20:06:36,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-10 20:06:51,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=727550.0, ans=0.125 2024-08-10 20:06:52,447 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 20:06:53,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 300, loss[loss=0.09799, beats_loss=0.01081, ecapa_loss=0.0002388, whisper_loss=0.08479, over 19475.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01139, ecapa_loss=0.0002272, whisper_loss=0.09459, over 2978819.84 frames. ], batch size: 75, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:06:56,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:06:57,129 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 20:07:00,820 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 17 from LS+wenet, 26 from Vox, 50 fro AS 2024-08-10 20:07:01,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=727650.0, ans=0.125 2024-08-10 20:07:01,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2024-08-10 20:07:03,539 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 20:07:08,063 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 20:07:08,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727750.0, ans=0.1 2024-08-10 20:07:09,665 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 20:07:10,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=727750.0, ans=0.125 2024-08-10 20:07:25,083 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 16 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 20:07:26,810 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 20:07:42,647 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 11 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 20:07:42,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=727950.0, ans=0.0 2024-08-10 20:07:45,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.770e+01 3.156e+01 3.793e+01 6.617e+01, threshold=6.313e+01, percent-clipped=1.0 2024-08-10 20:07:46,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=727950.0, ans=0.125 2024-08-10 20:07:58,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=728050.0, ans=0.0 2024-08-10 20:08:01,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728050.0, ans=0.1 2024-08-10 20:08:01,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728050.0, ans=0.1 2024-08-10 20:08:05,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=728050.0, ans=0.0 2024-08-10 20:08:07,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 350, loss[loss=0.1126, beats_loss=0.01101, ecapa_loss=0.0002032, whisper_loss=0.09953, over 22307.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01142, ecapa_loss=0.000226, whisper_loss=0.09415, over 3197620.82 frames. ], batch size: 86, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:08:32,238 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 20:08:35,246 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 20:08:36,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=728350.0, ans=0.0 2024-08-10 20:08:54,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=728450.0, ans=0.125 2024-08-10 20:08:56,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=728450.0, ans=0.125 2024-08-10 20:09:05,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=728550.0, ans=0.125 2024-08-10 20:09:10,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=728550.0, ans=0.0 2024-08-10 20:09:18,435 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 22 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-10 20:09:21,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 400, loss[loss=0.09702, beats_loss=0.01311, ecapa_loss=0.0001903, whisper_loss=0.08201, over 16627.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0115, ecapa_loss=0.0002244, whisper_loss=0.09366, over 3364432.61 frames. ], batch size: 66, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:10:12,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.814e+01 3.145e+01 3.714e+01 1.358e+02, threshold=6.291e+01, percent-clipped=2.0 2024-08-10 20:10:31,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=729050.0, ans=0.09899494936611666 2024-08-10 20:10:33,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 450, loss[loss=0.1122, beats_loss=0.01126, ecapa_loss=0.000247, whisper_loss=0.09849, over 15543.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01157, ecapa_loss=0.0002222, whisper_loss=0.09355, over 3461038.26 frames. ], batch size: 63, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:10:39,991 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 20:11:16,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=729450.0, ans=0.0 2024-08-10 20:11:46,008 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 20:11:47,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 500, loss[loss=0.08783, beats_loss=0.01318, ecapa_loss=0.0001907, whisper_loss=0.07275, over 17837.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0116, ecapa_loss=0.0002223, whisper_loss=0.09285, over 3546801.68 frames. ], batch size: 71, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:11:48,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-08-10 20:12:01,648 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 20:12:05,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729750.0, ans=0.1 2024-08-10 20:12:38,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=12.0 2024-08-10 20:12:41,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.724e+01 3.066e+01 3.405e+01 6.797e+01, threshold=6.131e+01, percent-clipped=1.0 2024-08-10 20:13:02,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-10 20:13:02,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 550, loss[loss=0.1146, beats_loss=0.01086, ecapa_loss=0.0002613, whisper_loss=0.1011, over 17170.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01161, ecapa_loss=0.0002192, whisper_loss=0.09301, over 3606483.88 frames. ], batch size: 67, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:13:26,437 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 20:13:34,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.11 vs. limit=10.0 2024-08-10 20:13:58,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=730450.0, ans=0.0 2024-08-10 20:14:35,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-10 20:14:42,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 600, loss[loss=0.09092, beats_loss=0.01168, ecapa_loss=0.0002774, whisper_loss=0.07647, over 21158.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01166, ecapa_loss=0.0002165, whisper_loss=0.09326, over 3680137.43 frames. ], batch size: 93, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:14:43,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=730650.0, ans=0.125 2024-08-10 20:14:50,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=730650.0, ans=0.09899494936611666 2024-08-10 20:15:04,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=730750.0, ans=0.125 2024-08-10 20:15:14,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=730850.0, ans=0.125 2024-08-10 20:15:23,813 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 20:15:26,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=730850.0, ans=0.125 2024-08-10 20:15:37,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.552e+01 2.834e+01 3.243e+01 4.859e+01, threshold=5.668e+01, percent-clipped=0.0 2024-08-10 20:15:37,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=730950.0, ans=0.125 2024-08-10 20:15:42,501 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 20:15:47,067 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 20:15:57,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=731050.0, ans=0.125 2024-08-10 20:16:08,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 650, loss[loss=0.08547, beats_loss=0.01383, ecapa_loss=0.0002022, whisper_loss=0.06962, over 15228.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01154, ecapa_loss=0.0002176, whisper_loss=0.09377, over 3705003.13 frames. ], batch size: 62, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:16:25,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2024-08-10 20:16:27,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731250.0, ans=0.1 2024-08-10 20:16:38,901 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-10 20:16:46,177 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 20:16:55,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=731350.0, ans=0.125 2024-08-10 20:17:13,821 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 20:17:34,800 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 20:17:48,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-10 20:17:50,841 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 700, loss[loss=0.1169, beats_loss=0.0128, ecapa_loss=0.0001888, whisper_loss=0.1022, over 18739.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01146, ecapa_loss=0.0002189, whisper_loss=0.09411, over 3753456.25 frames. ], batch size: 73, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:17:56,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=12.0 2024-08-10 20:17:57,873 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 20:18:17,724 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 20:19:14,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=731950.0, ans=0.125 2024-08-10 20:19:15,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.666e+01 3.015e+01 3.385e+01 4.873e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 20:19:23,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=731950.0, ans=0.07 2024-08-10 20:19:24,139 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 20:19:28,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=732050.0, ans=0.04949747468305833 2024-08-10 20:19:48,630 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 20:19:48,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=732150.0, ans=0.0 2024-08-10 20:19:49,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 750, loss[loss=0.09873, beats_loss=0.01262, ecapa_loss=0.0001949, whisper_loss=0.08416, over 17078.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01151, ecapa_loss=0.0002175, whisper_loss=0.09428, over 3769417.84 frames. ], batch size: 66, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:19:55,552 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 20:20:02,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=732150.0, ans=0.025 2024-08-10 20:20:05,393 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 20:20:12,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-10 20:20:37,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-08-10 20:21:09,773 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 20:21:42,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=732550.0, ans=0.0 2024-08-10 20:21:48,275 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 800, loss[loss=0.09551, beats_loss=0.01045, ecapa_loss=0.0002758, whisper_loss=0.08231, over 20498.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01156, ecapa_loss=0.0002173, whisper_loss=0.09373, over 3777380.00 frames. ], batch size: 87, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:21:49,799 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-10 20:22:03,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=732650.0, ans=0.125 2024-08-10 20:22:08,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732650.0, ans=0.1 2024-08-10 20:22:25,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=732750.0, ans=0.2 2024-08-10 20:22:29,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-10 20:22:41,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.98 vs. limit=15.0 2024-08-10 20:23:13,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.807e+01 3.275e+01 3.755e+01 8.468e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 20:23:14,020 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 20:23:38,599 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 20:23:43,178 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 850, loss[loss=0.09913, beats_loss=0.01347, ecapa_loss=0.0002013, whisper_loss=0.08364, over 22021.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01159, ecapa_loss=0.0002172, whisper_loss=0.09246, over 3771748.59 frames. ], batch size: 89, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:23:47,155 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 20:24:15,519 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 20:24:31,449 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 20:24:37,038 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 20:24:44,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2024-08-10 20:25:08,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=733650.0, ans=0.0 2024-08-10 20:25:09,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 900, loss[loss=0.1176, beats_loss=0.01102, ecapa_loss=0.0002592, whisper_loss=0.104, over 18340.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01156, ecapa_loss=0.0002149, whisper_loss=0.09294, over 3769257.18 frames. ], batch size: 72, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:25:18,791 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 20:25:24,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733650.0, ans=0.1 2024-08-10 20:25:57,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=733850.0, ans=0.0 2024-08-10 20:26:08,987 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 20:26:11,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=733950.0, ans=0.125 2024-08-10 20:26:12,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.731e+01 3.012e+01 3.536e+01 7.102e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-10 20:26:20,149 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-10 20:26:24,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=734050.0, ans=0.125 2024-08-10 20:26:25,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=734050.0, ans=0.2 2024-08-10 20:26:29,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734050.0, ans=0.125 2024-08-10 20:26:33,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=734050.0, ans=0.125 2024-08-10 20:26:38,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 950, loss[loss=0.09889, beats_loss=0.01593, ecapa_loss=0.000141, whisper_loss=0.08155, over 23155.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01157, ecapa_loss=0.0002147, whisper_loss=0.09242, over 3781303.72 frames. ], batch size: 90, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:26:58,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2024-08-10 20:27:18,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=734350.0, ans=0.0 2024-08-10 20:27:19,352 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 20:27:31,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=734450.0, ans=22.5 2024-08-10 20:27:35,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734450.0, ans=0.1 2024-08-10 20:27:51,629 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 20:27:56,802 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 20:28:01,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1000, loss[loss=0.1005, beats_loss=0.01095, ecapa_loss=0.0002605, whisper_loss=0.08692, over 21101.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01157, ecapa_loss=0.0002128, whisper_loss=0.09287, over 3763674.78 frames. ], batch size: 87, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:28:09,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.47 vs. limit=15.0 2024-08-10 20:28:29,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=734750.0, ans=0.125 2024-08-10 20:28:47,709 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 20:29:00,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.726e+01 3.092e+01 3.601e+01 1.041e+02, threshold=6.184e+01, percent-clipped=1.0 2024-08-10 20:29:06,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=734950.0, ans=0.0 2024-08-10 20:29:15,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=735050.0, ans=0.0 2024-08-10 20:29:16,217 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 20:29:25,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1050, loss[loss=0.1003, beats_loss=0.01041, ecapa_loss=0.0002466, whisper_loss=0.08747, over 21394.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01164, ecapa_loss=0.0002139, whisper_loss=0.09266, over 3776458.38 frames. ], batch size: 89, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:29:29,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=735150.0, ans=0.125 2024-08-10 20:29:31,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=735150.0, ans=0.2 2024-08-10 20:29:43,364 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 20:29:47,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=735250.0, ans=0.125 2024-08-10 20:29:52,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=735250.0, ans=0.0 2024-08-10 20:29:54,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735250.0, ans=0.1 2024-08-10 20:30:11,353 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 20:30:44,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2024-08-10 20:30:45,098 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 20:30:48,579 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 20:30:51,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1100, loss[loss=0.1158, beats_loss=0.01301, ecapa_loss=0.0001901, whisper_loss=0.1009, over 24089.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01163, ecapa_loss=0.0002139, whisper_loss=0.09353, over 3828405.04 frames. ], batch size: 93, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:31:34,427 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 20:31:50,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.770e+01 3.006e+01 3.661e+01 6.910e+01, threshold=6.012e+01, percent-clipped=1.0 2024-08-10 20:31:55,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=735950.0, ans=0.125 2024-08-10 20:32:01,159 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 20:32:07,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736050.0, ans=0.1 2024-08-10 20:32:13,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736050.0, ans=0.0 2024-08-10 20:32:15,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1150, loss[loss=0.08913, beats_loss=0.01544, ecapa_loss=0.000193, whisper_loss=0.07176, over 20939.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01171, ecapa_loss=0.0002138, whisper_loss=0.09285, over 3848589.49 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:32:18,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=736150.0, ans=0.125 2024-08-10 20:32:33,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=736250.0, ans=0.0 2024-08-10 20:32:41,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2024-08-10 20:32:58,006 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 20:33:30,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736550.0, ans=0.0 2024-08-10 20:33:37,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-10 20:33:39,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1200, loss[loss=0.1197, beats_loss=0.009049, ecapa_loss=0.0002709, whisper_loss=0.108, over 17137.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01161, ecapa_loss=0.000214, whisper_loss=0.09311, over 3830282.95 frames. ], batch size: 71, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:33:45,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-10 20:33:55,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736750.0, ans=0.125 2024-08-10 20:34:05,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=15.0 2024-08-10 20:34:14,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=736850.0, ans=0.1 2024-08-10 20:34:29,739 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.067e-02 2024-08-10 20:34:29,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=736950.0, ans=0.2 2024-08-10 20:34:33,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.774e+01 3.140e+01 3.554e+01 5.402e+01, threshold=6.279e+01, percent-clipped=0.0 2024-08-10 20:34:38,246 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 20:34:43,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2024-08-10 20:34:43,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.81 vs. limit=15.0 2024-08-10 20:34:57,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1250, loss[loss=0.1168, beats_loss=0.006596, ecapa_loss=0.0002351, whisper_loss=0.1079, over 15217.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01166, ecapa_loss=0.0002144, whisper_loss=0.09305, over 3840113.86 frames. ], batch size: 56, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:35:24,320 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 20:35:26,016 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 20:35:28,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=737350.0, ans=0.0 2024-08-10 20:35:35,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737350.0, ans=0.125 2024-08-10 20:35:36,603 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 20:35:41,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=737450.0, ans=0.125 2024-08-10 20:36:12,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1300, loss[loss=0.09945, beats_loss=0.01416, ecapa_loss=0.0002177, whisper_loss=0.08311, over 21654.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01169, ecapa_loss=0.0002143, whisper_loss=0.09241, over 3824441.09 frames. ], batch size: 94, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:36:23,748 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 20:36:27,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=737750.0, ans=0.09899494936611666 2024-08-10 20:36:38,408 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 20:36:40,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=737750.0, ans=0.09899494936611666 2024-08-10 20:36:45,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=737850.0, ans=0.1 2024-08-10 20:36:48,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.44 vs. limit=10.0 2024-08-10 20:36:49,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=737850.0, ans=0.125 2024-08-10 20:36:56,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-10 20:37:04,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=737950.0, ans=0.2 2024-08-10 20:37:08,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=737950.0, ans=0.2 2024-08-10 20:37:08,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.805e+01 3.070e+01 3.591e+01 5.506e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 20:37:23,755 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 20:37:34,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1350, loss[loss=0.1127, beats_loss=0.01159, ecapa_loss=0.0002011, whisper_loss=0.09912, over 19685.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01168, ecapa_loss=0.0002136, whisper_loss=0.09258, over 3851255.62 frames. ], batch size: 75, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:37:44,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=738150.0, ans=0.125 2024-08-10 20:38:07,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=738350.0, ans=0.0 2024-08-10 20:38:17,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738350.0, ans=0.1 2024-08-10 20:38:24,377 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 20:38:36,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=738450.0, ans=0.125 2024-08-10 20:38:37,857 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 20:38:40,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=738550.0, ans=0.0 2024-08-10 20:38:50,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.38 vs. limit=22.5 2024-08-10 20:38:56,040 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1400, loss[loss=0.118, beats_loss=0.01011, ecapa_loss=0.000234, whisper_loss=0.1055, over 19200.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01168, ecapa_loss=0.0002146, whisper_loss=0.09173, over 3820088.93 frames. ], batch size: 73, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:39:11,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=738650.0, ans=0.1 2024-08-10 20:39:43,057 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 20:39:52,406 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 20:39:54,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=738950.0, ans=0.125 2024-08-10 20:39:56,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.641e+01 2.966e+01 3.393e+01 5.160e+01, threshold=5.932e+01, percent-clipped=0.0 2024-08-10 20:39:58,892 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 20:40:02,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=738950.0, ans=0.0 2024-08-10 20:40:12,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=739050.0, ans=0.125 2024-08-10 20:40:20,564 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 20:40:20,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=739050.0, ans=0.125 2024-08-10 20:40:22,707 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 20:40:23,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1450, loss[loss=0.1182, beats_loss=0.01127, ecapa_loss=0.0002779, whisper_loss=0.1041, over 19854.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01159, ecapa_loss=0.0002156, whisper_loss=0.09245, over 3825034.24 frames. ], batch size: 83, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:41:01,603 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 20:41:11,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=739250.0, ans=0.125 2024-08-10 20:41:14,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739250.0, ans=0.1 2024-08-10 20:41:16,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=12.0 2024-08-10 20:41:27,373 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 20:41:30,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-08-10 20:41:39,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-10 20:42:04,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=739550.0, ans=0.125 2024-08-10 20:42:17,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=739650.0, ans=0.0 2024-08-10 20:42:18,439 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1500, loss[loss=0.1156, beats_loss=0.01023, ecapa_loss=0.0001846, whisper_loss=0.1036, over 24021.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01166, ecapa_loss=0.0002141, whisper_loss=0.09287, over 3855544.06 frames. ], batch size: 92, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:42:34,658 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 20:42:59,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=739850.0, ans=0.125 2024-08-10 20:43:02,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739850.0, ans=0.1 2024-08-10 20:43:14,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.729e+01 3.073e+01 3.413e+01 6.253e+01, threshold=6.146e+01, percent-clipped=1.0 2024-08-10 20:43:38,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1550, loss[loss=0.09419, beats_loss=0.01054, ecapa_loss=0.000283, whisper_loss=0.08082, over 18231.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01165, ecapa_loss=0.000214, whisper_loss=0.09288, over 3845725.26 frames. ], batch size: 78, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:43:45,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=740150.0, ans=0.0 2024-08-10 20:44:02,501 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 20:44:14,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=740350.0, ans=0.0 2024-08-10 20:44:24,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=740350.0, ans=0.2 2024-08-10 20:44:35,902 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 20:45:00,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1600, loss[loss=0.1195, beats_loss=0.009285, ecapa_loss=0.0002479, whisper_loss=0.1077, over 20956.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01158, ecapa_loss=0.0002128, whisper_loss=0.093, over 3854898.26 frames. ], batch size: 84, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:45:00,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=740650.0, ans=0.125 2024-08-10 20:45:07,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740650.0, ans=0.125 2024-08-10 20:45:20,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=740750.0, ans=0.0 2024-08-10 20:45:22,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=740750.0, ans=0.0 2024-08-10 20:45:24,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740750.0, ans=0.1 2024-08-10 20:45:34,849 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 20:45:36,081 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 20:45:54,539 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 20:45:56,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-08-10 20:45:58,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.574e+01 2.929e+01 3.457e+01 5.264e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-10 20:46:07,146 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 20:46:07,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741050.0, ans=0.1 2024-08-10 20:46:15,511 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 20:46:23,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1650, loss[loss=0.1072, beats_loss=0.01046, ecapa_loss=0.0002479, whisper_loss=0.09428, over 20364.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01162, ecapa_loss=0.0002125, whisper_loss=0.09297, over 3861759.70 frames. ], batch size: 80, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:46:26,942 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 20:46:28,334 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-10 20:46:28,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=741150.0, ans=0.05 2024-08-10 20:46:40,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=741250.0, ans=0.0 2024-08-10 20:46:40,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=741250.0, ans=0.125 2024-08-10 20:46:59,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741350.0, ans=0.1 2024-08-10 20:47:01,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2024-08-10 20:47:26,009 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 20:47:27,902 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 20:47:32,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=741550.0, ans=0.95 2024-08-10 20:47:40,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1700, loss[loss=0.1117, beats_loss=0.01365, ecapa_loss=0.0001575, whisper_loss=0.09643, over 18312.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01165, ecapa_loss=0.0002113, whisper_loss=0.09285, over 3868798.07 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:47:54,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741650.0, ans=0.1 2024-08-10 20:47:55,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=741750.0, ans=0.125 2024-08-10 20:48:12,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741850.0, ans=0.125 2024-08-10 20:48:22,374 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 20:48:22,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=741850.0, ans=0.1 2024-08-10 20:48:31,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=741950.0, ans=0.125 2024-08-10 20:48:34,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.737e+01 3.042e+01 3.583e+01 5.597e+01, threshold=6.084e+01, percent-clipped=0.0 2024-08-10 20:48:37,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=741950.0, ans=0.09899494936611666 2024-08-10 20:48:41,690 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 20:48:41,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=742050.0, ans=0.0 2024-08-10 20:48:44,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742050.0, ans=0.1 2024-08-10 20:48:56,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1750, loss[loss=0.11, beats_loss=0.01174, ecapa_loss=0.0001862, whisper_loss=0.09642, over 17911.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01158, ecapa_loss=0.0002108, whisper_loss=0.09309, over 3854286.05 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:49:10,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=742250.0, ans=0.125 2024-08-10 20:49:20,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=742250.0, ans=0.2 2024-08-10 20:49:21,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742250.0, ans=0.0 2024-08-10 20:49:28,469 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 20:49:37,517 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:50:11,705 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1800, loss[loss=0.1208, beats_loss=0.009439, ecapa_loss=0.0002599, whisper_loss=0.1087, over 16997.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01154, ecapa_loss=0.0002105, whisper_loss=0.09366, over 3863673.74 frames. ], batch size: 66, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:50:24,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=742650.0, ans=10.0 2024-08-10 20:50:25,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=742750.0, ans=0.1 2024-08-10 20:51:00,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742950.0, ans=0.1 2024-08-10 20:51:05,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.668e+01 3.016e+01 3.512e+01 6.004e+01, threshold=6.033e+01, percent-clipped=0.0 2024-08-10 20:51:06,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=742950.0, ans=0.125 2024-08-10 20:51:13,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-08-10 20:51:29,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1850, loss[loss=0.1056, beats_loss=0.01138, ecapa_loss=0.0002311, whisper_loss=0.09193, over 22837.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01143, ecapa_loss=0.0002127, whisper_loss=0.09466, over 3883948.46 frames. ], batch size: 90, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:51:45,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2024-08-10 20:51:50,659 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 20:51:53,850 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.854e-01 2024-08-10 20:52:10,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=743350.0, ans=0.0 2024-08-10 20:52:13,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=743450.0, ans=0.5 2024-08-10 20:52:16,654 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 20:52:44,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1900, loss[loss=0.1199, beats_loss=0.008192, ecapa_loss=0.0001875, whisper_loss=0.1098, over 14878.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01141, ecapa_loss=0.000217, whisper_loss=0.09446, over 3823735.18 frames. ], batch size: 55, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:52:44,949 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 20:52:49,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=743650.0, ans=0.0 2024-08-10 20:52:55,152 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:53:11,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=743750.0, ans=0.125 2024-08-10 20:53:36,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.690e+01 3.145e+01 3.666e+01 6.863e+01, threshold=6.290e+01, percent-clipped=1.0 2024-08-10 20:53:49,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=744050.0, ans=0.125 2024-08-10 20:54:01,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 1950, loss[loss=0.1012, beats_loss=0.01257, ecapa_loss=0.0002489, whisper_loss=0.08614, over 21784.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01155, ecapa_loss=0.0002167, whisper_loss=0.09392, over 3845591.28 frames. ], batch size: 90, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:54:06,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=744150.0, ans=0.0 2024-08-10 20:54:11,171 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 20:54:24,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=744250.0, ans=0.2 2024-08-10 20:54:25,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2024-08-10 20:54:25,978 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 20:54:27,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=744250.0, ans=0.0 2024-08-10 20:54:34,170 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 20:54:39,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-10 20:54:46,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=744450.0, ans=0.125 2024-08-10 20:55:03,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=744550.0, ans=0.0 2024-08-10 20:55:15,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=744550.0, ans=0.2 2024-08-10 20:55:18,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2000, loss[loss=0.1152, beats_loss=0.01169, ecapa_loss=0.0002359, whisper_loss=0.1012, over 12845.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0116, ecapa_loss=0.000219, whisper_loss=0.09399, over 3839490.15 frames. ], batch size: 54, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:55:57,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=744850.0, ans=0.0 2024-08-10 20:56:16,445 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.753e+01 3.103e+01 3.441e+01 5.353e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-10 20:56:17,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=744950.0, ans=0.0 2024-08-10 20:56:19,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=744950.0, ans=0.2 2024-08-10 20:56:41,074 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 20:56:42,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2050, loss[loss=0.09127, beats_loss=0.0109, ecapa_loss=0.0001655, whisper_loss=0.07871, over 14521.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01162, ecapa_loss=0.0002194, whisper_loss=0.09374, over 3828139.09 frames. ], batch size: 53, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:56:47,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=745150.0, ans=0.0 2024-08-10 20:56:47,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-10 20:56:51,694 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-10 20:56:53,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=745150.0, ans=0.125 2024-08-10 20:56:56,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=745250.0, ans=0.125 2024-08-10 20:57:06,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.58 vs. limit=22.5 2024-08-10 20:57:27,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=745350.0, ans=0.125 2024-08-10 20:57:35,715 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 20:57:48,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=745550.0, ans=0.125 2024-08-10 20:57:48,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=745550.0, ans=0.0 2024-08-10 20:57:51,985 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 20:57:56,069 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 20:58:02,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2100, loss[loss=0.1022, beats_loss=0.008934, ecapa_loss=0.0002577, whisper_loss=0.09066, over 13883.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01171, ecapa_loss=0.0002198, whisper_loss=0.09312, over 3810511.50 frames. ], batch size: 53, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:58:41,813 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 20:59:04,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=745950.0, ans=0.07 2024-08-10 20:59:06,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.787e+01 3.226e+01 3.870e+01 7.991e+01, threshold=6.452e+01, percent-clipped=3.0 2024-08-10 20:59:16,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=22.5 2024-08-10 20:59:23,630 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 20:59:28,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=746050.0, ans=0.125 2024-08-10 20:59:31,401 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2150, loss[loss=0.1056, beats_loss=0.01136, ecapa_loss=0.0001973, whisper_loss=0.0923, over 14342.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0117, ecapa_loss=0.0002206, whisper_loss=0.09311, over 3825840.36 frames. ], batch size: 54, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:59:53,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=746250.0, ans=0.0 2024-08-10 20:59:57,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746250.0, ans=0.0 2024-08-10 21:00:25,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=746450.0, ans=0.125 2024-08-10 21:00:33,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=746450.0, ans=0.125 2024-08-10 21:00:57,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2200, loss[loss=0.1131, beats_loss=0.01367, ecapa_loss=0.0002002, whisper_loss=0.09743, over 18044.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0117, ecapa_loss=0.0002207, whisper_loss=0.09416, over 3812067.13 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:00:58,011 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-10 21:01:07,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=746650.0, ans=0.0 2024-08-10 21:01:34,668 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 21:01:59,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.667e+01 3.183e+01 3.944e+01 1.052e+02, threshold=6.365e+01, percent-clipped=1.0 2024-08-10 21:02:01,747 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 21:02:08,272 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 40 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 21:02:08,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=747050.0, ans=0.125 2024-08-10 21:02:11,170 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-10 21:02:24,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2250, loss[loss=0.1069, beats_loss=0.0107, ecapa_loss=0.0002746, whisper_loss=0.09348, over 17580.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0117, ecapa_loss=0.0002222, whisper_loss=0.09442, over 3835044.46 frames. ], batch size: 69, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:02:33,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-10 21:02:40,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=747250.0, ans=0.0 2024-08-10 21:02:50,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=747250.0, ans=0.0 2024-08-10 21:03:00,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=747350.0, ans=0.125 2024-08-10 21:03:00,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2024-08-10 21:03:01,544 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 21:03:11,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=747350.0, ans=0.125 2024-08-10 21:03:21,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2024-08-10 21:03:34,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-10 21:03:36,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=747550.0, ans=0.125 2024-08-10 21:03:36,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747550.0, ans=0.125 2024-08-10 21:03:51,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2300, loss[loss=0.1238, beats_loss=0.01009, ecapa_loss=0.0002512, whisper_loss=0.1112, over 15329.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01175, ecapa_loss=0.0002209, whisper_loss=0.09449, over 3846128.25 frames. ], batch size: 61, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:04:02,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2024-08-10 21:04:18,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=747750.0, ans=0.125 2024-08-10 21:04:26,776 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 21:04:37,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-08-10 21:04:53,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.764e+01 3.059e+01 3.552e+01 5.257e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-10 21:05:15,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748050.0, ans=0.1 2024-08-10 21:05:17,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748050.0, ans=0.125 2024-08-10 21:05:19,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2350, loss[loss=0.1155, beats_loss=0.0115, ecapa_loss=0.0002168, whisper_loss=0.1019, over 20449.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01165, ecapa_loss=0.0002215, whisper_loss=0.09486, over 3843390.40 frames. ], batch size: 79, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:06:00,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=748350.0, ans=0.125 2024-08-10 21:06:02,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=748350.0, ans=0.2 2024-08-10 21:06:28,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2024-08-10 21:06:51,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2024-08-10 21:06:53,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748550.0, ans=0.1 2024-08-10 21:07:08,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2400, loss[loss=0.08219, beats_loss=0.0161, ecapa_loss=0.0002423, whisper_loss=0.06367, over 16843.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01165, ecapa_loss=0.0002229, whisper_loss=0.09437, over 3869598.39 frames. ], batch size: 72, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:07:13,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748650.0, ans=0.1 2024-08-10 21:07:22,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-08-10 21:07:29,040 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 21:07:39,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2024-08-10 21:08:21,266 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 21:08:47,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.712e+01 3.107e+01 3.563e+01 2.420e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-10 21:09:29,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2450, loss[loss=0.09252, beats_loss=0.01131, ecapa_loss=0.0002357, whisper_loss=0.07886, over 15780.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01168, ecapa_loss=0.0002245, whisper_loss=0.09383, over 3877282.69 frames. ], batch size: 67, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:09:42,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=749150.0, ans=0.0 2024-08-10 21:09:48,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=749150.0, ans=0.0 2024-08-10 21:09:52,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=749250.0, ans=0.0 2024-08-10 21:10:02,841 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 13 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 21:10:06,440 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 21:10:09,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=749250.0, ans=0.125 2024-08-10 21:10:15,791 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 21:10:17,349 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 21:10:28,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=749450.0, ans=0.125 2024-08-10 21:10:29,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=749450.0, ans=0.125 2024-08-10 21:10:36,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749450.0, ans=0.125 2024-08-10 21:10:47,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=749550.0, ans=0.125 2024-08-10 21:11:01,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2500, loss[loss=0.1035, beats_loss=0.0108, ecapa_loss=0.0002955, whisper_loss=0.08978, over 20719.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01155, ecapa_loss=0.0002275, whisper_loss=0.09443, over 3879265.55 frames. ], batch size: 91, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:11:11,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=749650.0, ans=0.125 2024-08-10 21:11:27,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=749750.0, ans=0.125 2024-08-10 21:11:30,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=12.0 2024-08-10 21:11:33,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749750.0, ans=0.125 2024-08-10 21:12:03,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.786e+01 3.132e+01 3.631e+01 5.389e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:12:05,460 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-10 21:12:21,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=750050.0, ans=0.2 2024-08-10 21:12:29,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-08-10 21:12:32,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2550, loss[loss=0.09369, beats_loss=0.01404, ecapa_loss=0.000199, whisper_loss=0.07766, over 14622.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0116, ecapa_loss=0.0002251, whisper_loss=0.09466, over 3885382.57 frames. ], batch size: 56, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:12:47,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=750150.0, ans=0.5 2024-08-10 21:13:01,973 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 21:13:07,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750350.0, ans=0.1 2024-08-10 21:13:14,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=750350.0, ans=0.95 2024-08-10 21:13:20,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=750350.0, ans=0.0 2024-08-10 21:13:32,558 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 21:13:45,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=750450.0, ans=0.07 2024-08-10 21:13:53,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=750550.0, ans=0.0 2024-08-10 21:13:57,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=750550.0, ans=0.2 2024-08-10 21:14:07,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-10 21:14:07,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2600, loss[loss=0.09769, beats_loss=0.01196, ecapa_loss=0.0002347, whisper_loss=0.08338, over 18114.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01161, ecapa_loss=0.0002251, whisper_loss=0.09448, over 3893097.96 frames. ], batch size: 73, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:14:32,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2024-08-10 21:14:37,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=750750.0, ans=0.0 2024-08-10 21:14:41,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-10 21:14:43,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750850.0, ans=0.125 2024-08-10 21:14:48,342 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 21:15:09,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.826e+01 3.235e+01 3.900e+01 8.164e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 21:15:18,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-10 21:15:24,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=751050.0, ans=0.125 2024-08-10 21:15:27,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=751050.0, ans=0.125 2024-08-10 21:15:30,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=751050.0, ans=0.125 2024-08-10 21:15:33,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=751150.0, ans=0.125 2024-08-10 21:15:33,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2650, loss[loss=0.09941, beats_loss=0.01468, ecapa_loss=0.0002025, whisper_loss=0.0827, over 22415.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01156, ecapa_loss=0.0002248, whisper_loss=0.09464, over 3905034.16 frames. ], batch size: 93, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:15:43,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751150.0, ans=0.1 2024-08-10 21:15:46,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.30 vs. limit=15.0 2024-08-10 21:15:57,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-10 21:15:59,967 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 21:16:20,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=751350.0, ans=0.125 2024-08-10 21:16:23,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=751350.0, ans=0.025 2024-08-10 21:16:42,392 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 35 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 21:17:02,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2700, loss[loss=0.1194, beats_loss=0.01094, ecapa_loss=0.0002564, whisper_loss=0.1059, over 17388.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01159, ecapa_loss=0.0002253, whisper_loss=0.09423, over 3927290.40 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:17:11,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=751650.0, ans=0.125 2024-08-10 21:17:23,910 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 21:17:27,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=751750.0, ans=0.05 2024-08-10 21:18:03,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 3.016e+01 3.341e+01 3.971e+01 1.144e+02, threshold=6.682e+01, percent-clipped=3.0 2024-08-10 21:18:09,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=751950.0, ans=0.0 2024-08-10 21:18:11,032 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 21:18:25,935 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 21:18:28,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2750, loss[loss=0.1143, beats_loss=0.01316, ecapa_loss=0.0001767, whisper_loss=0.09938, over 16970.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01168, ecapa_loss=0.0002264, whisper_loss=0.09389, over 3877024.92 frames. ], batch size: 64, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:18:47,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=752250.0, ans=0.1 2024-08-10 21:18:49,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=752250.0, ans=0.0 2024-08-10 21:18:51,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-10 21:19:12,247 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 21:19:20,581 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 21:19:33,623 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 12 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 21:19:40,769 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 21:19:42,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752550.0, ans=0.1 2024-08-10 21:19:54,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2800, loss[loss=0.1072, beats_loss=0.01123, ecapa_loss=0.0002031, whisper_loss=0.09391, over 15742.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0117, ecapa_loss=0.0002246, whisper_loss=0.09363, over 3852525.52 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:20:02,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=752650.0, ans=0.0 2024-08-10 21:20:03,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=752650.0, ans=0.0 2024-08-10 21:20:20,505 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 21:20:36,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=752850.0, ans=0.125 2024-08-10 21:20:43,007 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 21:20:52,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=752950.0, ans=0.0 2024-08-10 21:20:53,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.721e+01 3.078e+01 3.353e+01 6.515e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-10 21:21:16,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753050.0, ans=0.1 2024-08-10 21:21:20,804 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2850, loss[loss=0.1121, beats_loss=0.009371, ecapa_loss=0.0002663, whisper_loss=0.1001, over 17552.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0117, ecapa_loss=0.0002236, whisper_loss=0.09419, over 3851514.88 frames. ], batch size: 72, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:21:28,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=753150.0, ans=0.0 2024-08-10 21:21:33,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2024-08-10 21:21:44,957 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:21:54,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=753250.0, ans=0.125 2024-08-10 21:22:21,647 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 21:22:25,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-10 21:22:53,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2900, loss[loss=0.1015, beats_loss=0.01426, ecapa_loss=0.0001886, whisper_loss=0.08539, over 22944.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01173, ecapa_loss=0.0002235, whisper_loss=0.09433, over 3883827.51 frames. ], batch size: 91, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:23:10,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=753750.0, ans=10.0 2024-08-10 21:23:28,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-10 21:23:39,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=753850.0, ans=0.125 2024-08-10 21:23:53,391 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.758e+01 3.070e+01 3.678e+01 5.521e+01, threshold=6.141e+01, percent-clipped=0.0 2024-08-10 21:23:56,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.97 vs. limit=10.0 2024-08-10 21:24:17,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754150.0, ans=0.1 2024-08-10 21:24:18,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 2950, loss[loss=0.105, beats_loss=0.01502, ecapa_loss=0.0001907, whisper_loss=0.08808, over 22605.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01186, ecapa_loss=0.0002226, whisper_loss=0.09439, over 3906982.90 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:24:19,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2024-08-10 21:24:26,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=754150.0, ans=0.2 2024-08-10 21:24:51,891 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 21:24:52,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=754350.0, ans=0.125 2024-08-10 21:25:06,417 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 21:25:08,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=754450.0, ans=0.0 2024-08-10 21:25:10,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=754450.0, ans=0.09899494936611666 2024-08-10 21:25:23,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754550.0, ans=0.1 2024-08-10 21:25:39,058 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3000, loss[loss=0.109, beats_loss=0.01083, ecapa_loss=0.0001596, whisper_loss=0.09658, over 16625.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01185, ecapa_loss=0.0002227, whisper_loss=0.09379, over 3910300.91 frames. ], batch size: 59, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:25:39,058 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 21:26:18,901 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0007066, whisper_loss=0.2527, over 922467.00 frames. 2024-08-10 21:26:38,571 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on SV_voxceleb1: loss=0.005938, beats_loss=0, ecapa_loss=0.0005938, whisper_loss=0, over 939242.00 frames. 2024-08-10 21:28:42,335 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on AT_audioset: loss=0.02614, beats_loss=0.02614, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 21:28:42,339 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 21:28:42,741 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.429e+03 2024-08-10 21:29:00,420 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 21:29:00,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=754750.0, ans=0.0 2024-08-10 21:29:02,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=754750.0, ans=0.0 2024-08-10 21:29:27,523 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-10 21:29:30,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=754950.0, ans=0.2 2024-08-10 21:29:36,000 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 21:29:38,560 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.874e+01 3.287e+01 3.873e+01 6.300e+01, threshold=6.573e+01, percent-clipped=1.0 2024-08-10 21:29:41,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=15.0 2024-08-10 21:29:46,576 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 21:30:02,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3050, loss[loss=0.08486, beats_loss=0.0111, ecapa_loss=0.0002421, whisper_loss=0.07134, over 15289.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0117, ecapa_loss=0.0002243, whisper_loss=0.09469, over 3909316.40 frames. ], batch size: 62, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:30:24,482 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 33 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 21:30:24,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=755250.0, ans=0.125 2024-08-10 21:30:43,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755350.0, ans=0.1 2024-08-10 21:30:51,839 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 21:30:58,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=755450.0, ans=0.125 2024-08-10 21:31:01,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=755450.0, ans=0.04949747468305833 2024-08-10 21:31:15,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755550.0, ans=0.1 2024-08-10 21:31:20,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=755550.0, ans=0.125 2024-08-10 21:31:22,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3100, loss[loss=0.09464, beats_loss=0.01311, ecapa_loss=0.0002028, whisper_loss=0.07951, over 19506.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01168, ecapa_loss=0.0002262, whisper_loss=0.09483, over 3883448.45 frames. ], batch size: 78, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:31:25,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=755650.0, ans=10.0 2024-08-10 21:31:38,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=755750.0, ans=0.125 2024-08-10 21:31:52,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=755750.0, ans=0.125 2024-08-10 21:32:01,010 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 21:32:02,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=755850.0, ans=0.0 2024-08-10 21:32:12,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.75 vs. limit=5.0 2024-08-10 21:32:21,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.626e+01 2.939e+01 3.498e+01 4.571e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-10 21:32:33,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-08-10 21:32:42,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=756050.0, ans=0.125 2024-08-10 21:32:44,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3150, loss[loss=0.1165, beats_loss=0.01225, ecapa_loss=0.0002386, whisper_loss=0.1018, over 17419.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01173, ecapa_loss=0.0002263, whisper_loss=0.09437, over 3886741.83 frames. ], batch size: 70, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:32:50,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=756150.0, ans=0.2 2024-08-10 21:32:53,315 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 21:33:02,179 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 21:33:02,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756250.0, ans=0.1 2024-08-10 21:33:04,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-10 21:33:13,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=756250.0, ans=0.125 2024-08-10 21:33:22,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-08-10 21:33:51,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-08-10 21:33:52,002 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 21:33:53,344 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 21:34:04,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3200, loss[loss=0.1208, beats_loss=0.01202, ecapa_loss=0.0002273, whisper_loss=0.1065, over 16900.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01177, ecapa_loss=0.0002257, whisper_loss=0.09416, over 3838026.00 frames. ], batch size: 69, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:34:05,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=756650.0, ans=0.0 2024-08-10 21:34:18,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=756650.0, ans=0.0 2024-08-10 21:34:25,010 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 21:34:25,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=756750.0, ans=0.125 2024-08-10 21:34:42,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=756850.0, ans=0.125 2024-08-10 21:34:50,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=756850.0, ans=0.0 2024-08-10 21:34:51,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=756850.0, ans=0.0 2024-08-10 21:34:56,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=756950.0, ans=15.0 2024-08-10 21:34:57,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-08-10 21:35:02,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-08-10 21:35:03,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.765e+01 3.113e+01 3.844e+01 7.476e+01, threshold=6.225e+01, percent-clipped=4.0 2024-08-10 21:35:24,668 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 21:35:26,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3250, loss[loss=0.1007, beats_loss=0.01099, ecapa_loss=0.0002964, whisper_loss=0.08671, over 20982.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01171, ecapa_loss=0.0002244, whisper_loss=0.0947, over 3841435.15 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:36:08,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=757350.0, ans=10.0 2024-08-10 21:36:21,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=757450.0, ans=0.125 2024-08-10 21:36:49,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=757650.0, ans=0.0 2024-08-10 21:36:50,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3300, loss[loss=0.1055, beats_loss=0.01344, ecapa_loss=0.0002144, whisper_loss=0.08987, over 15671.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0117, ecapa_loss=0.0002253, whisper_loss=0.0943, over 3840111.62 frames. ], batch size: 65, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:37:02,544 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 14 from Vox, 52 fro AS 2024-08-10 21:37:13,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=757750.0, ans=12.0 2024-08-10 21:37:41,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.67 vs. limit=22.5 2024-08-10 21:37:44,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757950.0, ans=0.125 2024-08-10 21:37:47,897 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 21:37:50,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.783e+01 3.080e+01 3.590e+01 5.176e+01, threshold=6.160e+01, percent-clipped=0.0 2024-08-10 21:37:52,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2024-08-10 21:37:57,807 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 21:37:59,477 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 21:38:15,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3350, loss[loss=0.1187, beats_loss=0.01013, ecapa_loss=0.0002419, whisper_loss=0.1061, over 21876.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01152, ecapa_loss=0.0002257, whisper_loss=0.09575, over 3863998.05 frames. ], batch size: 88, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:38:16,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=758150.0, ans=0.2 2024-08-10 21:38:22,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-10 21:38:31,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-10 21:38:41,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=758250.0, ans=0.125 2024-08-10 21:38:43,938 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 21:38:45,215 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 21:38:49,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.70 vs. limit=22.5 2024-08-10 21:38:56,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2024-08-10 21:38:59,028 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 21:39:09,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2024-08-10 21:39:27,583 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 21:39:33,419 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3400, loss[loss=0.09672, beats_loss=0.01219, ecapa_loss=0.0002196, whisper_loss=0.08233, over 19332.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01159, ecapa_loss=0.0002234, whisper_loss=0.09472, over 3872052.28 frames. ], batch size: 77, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:39:37,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=758650.0, ans=0.0 2024-08-10 21:39:41,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.63 vs. limit=10.0 2024-08-10 21:39:57,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=758750.0, ans=0.2 2024-08-10 21:40:32,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.745e+01 3.132e+01 3.636e+01 5.691e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:40:47,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=759050.0, ans=0.2 2024-08-10 21:40:56,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3450, loss[loss=0.08707, beats_loss=0.01345, ecapa_loss=0.0002408, whisper_loss=0.07121, over 21425.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01155, ecapa_loss=0.0002236, whisper_loss=0.09505, over 3882379.20 frames. ], batch size: 93, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:40:56,753 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 21:41:01,547 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 21:41:06,379 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 10 from Vox, 42 fro AS 2024-08-10 21:41:06,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=759150.0, ans=0.125 2024-08-10 21:41:15,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=759250.0, ans=0.1 2024-08-10 21:41:52,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=759450.0, ans=0.0 2024-08-10 21:42:04,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=759550.0, ans=0.125 2024-08-10 21:42:08,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=759550.0, ans=0.125 2024-08-10 21:42:12,931 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 21:42:19,395 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3500, loss[loss=0.08511, beats_loss=0.01157, ecapa_loss=0.0002213, whisper_loss=0.07133, over 13672.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01157, ecapa_loss=0.0002252, whisper_loss=0.09505, over 3907728.21 frames. ], batch size: 54, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:42:19,682 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 21:42:20,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=759650.0, ans=0.125 2024-08-10 21:42:21,485 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 21:42:22,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=759650.0, ans=0.0 2024-08-10 21:42:32,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=759650.0, ans=0.125 2024-08-10 21:42:41,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2024-08-10 21:42:57,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-10 21:42:58,648 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 21:43:11,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.666e+01 2.958e+01 3.304e+01 6.870e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-10 21:43:17,352 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 21:43:17,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760050.0, ans=0.1 2024-08-10 21:43:24,832 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 21:43:29,021 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 21:43:31,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3550, loss[loss=0.1339, beats_loss=0.008351, ecapa_loss=0.0002295, whisper_loss=0.1233, over 15749.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01154, ecapa_loss=0.000224, whisper_loss=0.09497, over 3875108.57 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:43:41,185 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-10 21:43:46,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-10 21:43:53,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=760250.0, ans=0.0 2024-08-10 21:44:00,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=760350.0, ans=0.2 2024-08-10 21:44:08,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=760350.0, ans=0.0 2024-08-10 21:44:14,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-10 21:44:24,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.65 vs. limit=22.5 2024-08-10 21:44:25,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=760550.0, ans=0.125 2024-08-10 21:44:33,031 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 21:44:34,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=760550.0, ans=0.125 2024-08-10 21:44:36,918 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3600, loss[loss=0.1143, beats_loss=0.01122, ecapa_loss=0.0002566, whisper_loss=0.1006, over 19333.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01159, ecapa_loss=0.0002251, whisper_loss=0.09482, over 3875777.14 frames. ], batch size: 77, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:44:38,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=760650.0, ans=0.0 2024-08-10 21:44:40,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=760650.0, ans=0.125 2024-08-10 21:44:50,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-10 21:44:56,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-08-10 21:45:01,227 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 21:45:05,090 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-10 21:45:17,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=760950.0, ans=0.0 2024-08-10 21:45:19,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=760950.0, ans=0.0 2024-08-10 21:45:21,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760950.0, ans=0.125 2024-08-10 21:45:23,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.660e+01 3.011e+01 3.359e+01 4.667e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-10 21:45:26,499 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 21:45:28,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=760950.0, ans=0.0 2024-08-10 21:45:43,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3650, loss[loss=0.105, beats_loss=0.01158, ecapa_loss=0.0002365, whisper_loss=0.09104, over 21067.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01152, ecapa_loss=0.0002257, whisper_loss=0.0949, over 3871266.20 frames. ], batch size: 87, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:45:53,385 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 21:45:58,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=761250.0, ans=0.0 2024-08-10 21:46:07,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=761250.0, ans=0.2 2024-08-10 21:46:08,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=761350.0, ans=0.125 2024-08-10 21:46:09,526 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 21:46:21,646 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 21:46:37,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=761550.0, ans=0.2 2024-08-10 21:46:40,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761550.0, ans=0.1 2024-08-10 21:46:48,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3700, loss[loss=0.1436, beats_loss=0.007977, ecapa_loss=0.0002464, whisper_loss=0.1332, over 19180.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01153, ecapa_loss=0.0002238, whisper_loss=0.09478, over 3900953.05 frames. ], batch size: 74, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:47:04,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=761750.0, ans=0.015 2024-08-10 21:47:19,273 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 21:47:30,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=761950.0, ans=0.125 2024-08-10 21:47:35,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.698e+01 3.015e+01 3.307e+01 5.689e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 21:47:52,826 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 21:47:53,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=762050.0, ans=0.0 2024-08-10 21:47:53,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=762050.0, ans=0.125 2024-08-10 21:47:55,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3750, loss[loss=0.1187, beats_loss=0.01206, ecapa_loss=0.0002212, whisper_loss=0.1045, over 22286.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01166, ecapa_loss=0.0002241, whisper_loss=0.0942, over 3893029.32 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:48:06,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=762150.0, ans=0.125 2024-08-10 21:48:06,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-08-10 21:48:11,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=762250.0, ans=0.05 2024-08-10 21:48:19,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=762250.0, ans=0.0 2024-08-10 21:48:30,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=762350.0, ans=0.0 2024-08-10 21:48:36,872 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 21:48:53,412 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 39 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 21:49:01,274 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3800, loss[loss=0.1109, beats_loss=0.009484, ecapa_loss=0.0003082, whisper_loss=0.09836, over 15257.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01168, ecapa_loss=0.0002249, whisper_loss=0.09482, over 3905136.09 frames. ], batch size: 62, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:49:07,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=762650.0, ans=0.0 2024-08-10 21:49:26,499 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 21:49:47,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.822e+01 3.123e+01 3.627e+01 5.849e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-10 21:49:49,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=762950.0, ans=0.1 2024-08-10 21:49:50,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=762950.0, ans=0.125 2024-08-10 21:50:07,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3850, loss[loss=0.1156, beats_loss=0.009896, ecapa_loss=0.0002247, whisper_loss=0.1035, over 18376.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01172, ecapa_loss=0.0002238, whisper_loss=0.09519, over 3894102.77 frames. ], batch size: 74, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:50:11,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=763150.0, ans=0.0 2024-08-10 21:50:27,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-10 21:50:28,181 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 21:50:28,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=763250.0, ans=0.0 2024-08-10 21:50:40,332 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 21:50:42,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=763350.0, ans=0.0 2024-08-10 21:50:42,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=763350.0, ans=0.125 2024-08-10 21:51:12,983 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3900, loss[loss=0.1198, beats_loss=0.01152, ecapa_loss=0.0002323, whisper_loss=0.106, over 22838.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01166, ecapa_loss=0.0002238, whisper_loss=0.09568, over 3900911.04 frames. ], batch size: 91, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:51:21,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=763650.0, ans=0.0 2024-08-10 21:51:40,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=763850.0, ans=0.125 2024-08-10 21:51:52,171 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 21:51:53,338 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 21:51:58,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.868e+01 3.190e+01 3.521e+01 6.195e+01, threshold=6.380e+01, percent-clipped=0.0 2024-08-10 21:52:02,955 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 21:52:04,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=764050.0, ans=0.125 2024-08-10 21:52:17,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 3950, loss[loss=0.1134, beats_loss=0.011, ecapa_loss=0.0002195, whisper_loss=0.1002, over 20897.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01165, ecapa_loss=0.0002243, whisper_loss=0.09531, over 3881904.74 frames. ], batch size: 84, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:52:24,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=764150.0, ans=0.125 2024-08-10 21:52:32,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=764250.0, ans=0.0 2024-08-10 21:52:38,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=764250.0, ans=0.0 2024-08-10 21:52:41,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-10 21:53:03,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=764450.0, ans=0.125 2024-08-10 21:53:05,980 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 21:53:17,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=764550.0, ans=0.0 2024-08-10 21:53:24,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4000, loss[loss=0.1198, beats_loss=0.01169, ecapa_loss=0.0002318, whisper_loss=0.1058, over 21446.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01159, ecapa_loss=0.000224, whisper_loss=0.0955, over 3878776.82 frames. ], batch size: 85, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:53:25,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=764650.0, ans=0.05 2024-08-10 21:53:27,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=764650.0, ans=0.0 2024-08-10 21:53:30,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764650.0, ans=0.125 2024-08-10 21:53:32,851 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 21:53:37,491 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=12.0 2024-08-10 21:53:44,811 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 21:53:51,112 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 21:54:01,889 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 21:54:07,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2024-08-10 21:54:08,605 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 21:54:09,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.764e+01 3.105e+01 3.573e+01 5.750e+01, threshold=6.210e+01, percent-clipped=0.0 2024-08-10 21:54:24,277 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 21:54:26,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=765050.0, ans=0.0 2024-08-10 21:54:29,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4050, loss[loss=0.09876, beats_loss=0.009822, ecapa_loss=0.0002099, whisper_loss=0.08684, over 17636.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01155, ecapa_loss=0.0002246, whisper_loss=0.09567, over 3879150.18 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:54:29,842 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 21:54:39,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=765150.0, ans=0.125 2024-08-10 21:54:39,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=765150.0, ans=0.125 2024-08-10 21:54:46,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=765250.0, ans=0.0 2024-08-10 21:54:49,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=765250.0, ans=0.125 2024-08-10 21:54:58,155 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 21:54:58,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=765350.0, ans=0.0 2024-08-10 21:55:11,019 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 21:55:14,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=765450.0, ans=0.0 2024-08-10 21:55:17,560 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 21:55:18,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.34 vs. limit=5.0 2024-08-10 21:55:24,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=765550.0, ans=0.0 2024-08-10 21:55:29,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-10 21:55:34,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4100, loss[loss=0.1124, beats_loss=0.01111, ecapa_loss=0.000259, whisper_loss=0.09865, over 22067.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01158, ecapa_loss=0.000225, whisper_loss=0.09548, over 3880246.25 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:55:57,015 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 21:56:00,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=765850.0, ans=0.125 2024-08-10 21:56:15,425 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 21:56:16,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=765950.0, ans=0.125 2024-08-10 21:56:20,905 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.758e+01 3.048e+01 3.457e+01 5.910e+01, threshold=6.096e+01, percent-clipped=0.0 2024-08-10 21:56:40,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4150, loss[loss=0.1006, beats_loss=0.01119, ecapa_loss=0.0002608, whisper_loss=0.08681, over 16221.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01166, ecapa_loss=0.0002258, whisper_loss=0.09533, over 3887547.01 frames. ], batch size: 65, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:56:44,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-10 21:56:48,549 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 21:56:51,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=766150.0, ans=0.125 2024-08-10 21:56:57,570 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 21:57:00,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=766250.0, ans=0.09899494936611666 2024-08-10 21:57:06,930 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 21:57:12,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=766350.0, ans=0.125 2024-08-10 21:57:19,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-08-10 21:57:22,421 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 21:57:35,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=766550.0, ans=0.125 2024-08-10 21:57:38,296 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 21:57:46,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4200, loss[loss=0.1428, beats_loss=0.008341, ecapa_loss=0.000276, whisper_loss=0.1317, over 19010.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01162, ecapa_loss=0.0002246, whisper_loss=0.0958, over 3899338.08 frames. ], batch size: 74, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:57:47,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-08-10 21:58:09,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-10 21:58:13,404 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 21:58:24,396 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 21:58:31,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.720e+01 3.062e+01 3.636e+01 5.115e+01, threshold=6.123e+01, percent-clipped=0.0 2024-08-10 21:58:46,404 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 21:58:51,338 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4250, loss[loss=0.09268, beats_loss=0.01347, ecapa_loss=0.0002117, whisper_loss=0.0771, over 21587.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01167, ecapa_loss=0.0002228, whisper_loss=0.09526, over 3906865.86 frames. ], batch size: 93, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:58:52,793 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 21:59:01,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=767150.0, ans=0.0 2024-08-10 21:59:02,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767150.0, ans=0.1 2024-08-10 21:59:05,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=767250.0, ans=0.2 2024-08-10 21:59:31,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767450.0, ans=0.125 2024-08-10 21:59:48,369 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 21:59:51,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=767550.0, ans=0.0 2024-08-10 21:59:57,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4300, loss[loss=0.1229, beats_loss=0.009951, ecapa_loss=0.0002623, whisper_loss=0.1103, over 19669.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01161, ecapa_loss=0.000223, whisper_loss=0.095, over 3910250.03 frames. ], batch size: 82, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:00:22,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2024-08-10 22:00:24,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767850.0, ans=0.1 2024-08-10 22:00:28,084 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 22:00:29,247 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 22:00:31,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-10 22:00:38,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=767950.0, ans=0.125 2024-08-10 22:00:43,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.655e+01 2.968e+01 3.386e+01 7.323e+01, threshold=5.937e+01, percent-clipped=2.0 2024-08-10 22:00:44,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.97 vs. limit=10.0 2024-08-10 22:00:44,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2024-08-10 22:00:59,994 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 22:01:03,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4350, loss[loss=0.1101, beats_loss=0.006879, ecapa_loss=0.0002377, whisper_loss=0.1009, over 15244.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01157, ecapa_loss=0.0002253, whisper_loss=0.09479, over 3874392.53 frames. ], batch size: 54, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:01:04,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.33 vs. limit=22.5 2024-08-10 22:01:05,002 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 22:01:08,227 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:01:22,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=768250.0, ans=0.0 2024-08-10 22:01:29,898 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 22:01:31,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=768350.0, ans=0.2 2024-08-10 22:01:39,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=768350.0, ans=0.2 2024-08-10 22:01:50,429 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-10 22:02:08,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4400, loss[loss=0.1029, beats_loss=0.01327, ecapa_loss=0.0001869, whisper_loss=0.08772, over 19297.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01152, ecapa_loss=0.0002238, whisper_loss=0.09514, over 3888610.57 frames. ], batch size: 76, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:02:23,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=768750.0, ans=0.04949747468305833 2024-08-10 22:02:55,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.875e+01 3.279e+01 3.849e+01 6.433e+01, threshold=6.559e+01, percent-clipped=3.0 2024-08-10 22:03:12,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769050.0, ans=0.1 2024-08-10 22:03:14,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4450, loss[loss=0.1047, beats_loss=0.01244, ecapa_loss=0.0002086, whisper_loss=0.09021, over 14674.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01149, ecapa_loss=0.000224, whisper_loss=0.09505, over 3871631.13 frames. ], batch size: 56, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:03:19,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-10 22:03:29,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=769250.0, ans=0.125 2024-08-10 22:03:30,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=12.0 2024-08-10 22:03:31,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=769250.0, ans=0.2 2024-08-10 22:03:42,707 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 22:03:50,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=769350.0, ans=0.125 2024-08-10 22:03:51,542 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 22:04:02,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2024-08-10 22:04:04,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769450.0, ans=0.125 2024-08-10 22:04:12,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=769550.0, ans=0.125 2024-08-10 22:04:15,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=769550.0, ans=0.0 2024-08-10 22:04:20,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4500, loss[loss=0.09545, beats_loss=0.01154, ecapa_loss=0.0001993, whisper_loss=0.08192, over 17323.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01158, ecapa_loss=0.0002235, whisper_loss=0.09442, over 3866039.85 frames. ], batch size: 68, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:04:22,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=769650.0, ans=0.125 2024-08-10 22:04:23,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=769650.0, ans=0.2 2024-08-10 22:04:35,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=769750.0, ans=0.0 2024-08-10 22:04:41,175 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 22:04:45,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769850.0, ans=0.1 2024-08-10 22:04:46,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=769850.0, ans=0.125 2024-08-10 22:04:48,617 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 22:04:52,497 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 22:05:02,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=769950.0, ans=0.125 2024-08-10 22:05:05,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.643e+01 3.102e+01 3.659e+01 7.014e+01, threshold=6.204e+01, percent-clipped=1.0 2024-08-10 22:05:06,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=769950.0, ans=0.125 2024-08-10 22:05:22,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=770050.0, ans=0.0 2024-08-10 22:05:25,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4550, loss[loss=0.1038, beats_loss=0.008843, ecapa_loss=0.0002739, whisper_loss=0.09223, over 19173.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0115, ecapa_loss=0.0002264, whisper_loss=0.09487, over 3863592.23 frames. ], batch size: 73, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:05:33,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=12.0 2024-08-10 22:05:40,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=770250.0, ans=0.125 2024-08-10 22:05:50,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=770350.0, ans=0.05 2024-08-10 22:05:53,437 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 22:06:14,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=770450.0, ans=0.125 2024-08-10 22:06:19,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=770550.0, ans=0.0 2024-08-10 22:06:28,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=770550.0, ans=0.0 2024-08-10 22:06:30,332 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4600, loss[loss=0.1108, beats_loss=0.01331, ecapa_loss=0.0001613, whisper_loss=0.09587, over 18529.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01147, ecapa_loss=0.0002279, whisper_loss=0.09458, over 3866170.16 frames. ], batch size: 72, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:06:37,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=770650.0, ans=0.125 2024-08-10 22:06:41,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=770650.0, ans=0.125 2024-08-10 22:07:05,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2024-08-10 22:07:16,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.935e+01 3.290e+01 3.824e+01 6.429e+01, threshold=6.581e+01, percent-clipped=1.0 2024-08-10 22:07:16,925 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 22:07:36,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4650, loss[loss=0.1022, beats_loss=0.01115, ecapa_loss=0.0002529, whisper_loss=0.08847, over 19666.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01146, ecapa_loss=0.0002275, whisper_loss=0.0949, over 3852648.78 frames. ], batch size: 76, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:07:36,860 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 22:07:38,158 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 22:07:39,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=771150.0, ans=0.125 2024-08-10 22:07:43,439 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 22:07:49,993 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 22:08:33,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771550.0, ans=0.1 2024-08-10 22:08:43,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4700, loss[loss=0.09249, beats_loss=0.01265, ecapa_loss=0.0002189, whisper_loss=0.07766, over 13574.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01145, ecapa_loss=0.0002266, whisper_loss=0.09427, over 3833574.15 frames. ], batch size: 53, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:08:51,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771650.0, ans=0.1 2024-08-10 22:08:51,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.08 vs. limit=15.0 2024-08-10 22:08:57,674 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 22:09:03,441 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 22:09:07,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=771750.0, ans=0.2 2024-08-10 22:09:16,915 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 22:09:22,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=771950.0, ans=0.07 2024-08-10 22:09:29,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.713e+01 3.049e+01 3.532e+01 5.514e+01, threshold=6.097e+01, percent-clipped=0.0 2024-08-10 22:09:31,160 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 22:09:49,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4750, loss[loss=0.1224, beats_loss=0.01028, ecapa_loss=0.0002271, whisper_loss=0.1098, over 22399.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01146, ecapa_loss=0.0002254, whisper_loss=0.09474, over 3854118.49 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:09:49,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=772150.0, ans=0.0 2024-08-10 22:09:54,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=15.0 2024-08-10 22:10:10,958 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 22:10:15,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=772350.0, ans=0.125 2024-08-10 22:10:19,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=772350.0, ans=0.0 2024-08-10 22:10:27,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772350.0, ans=0.1 2024-08-10 22:10:27,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2024-08-10 22:10:36,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=772450.0, ans=15.0 2024-08-10 22:10:48,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=772550.0, ans=0.0 2024-08-10 22:10:49,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=772550.0, ans=0.0 2024-08-10 22:10:55,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4800, loss[loss=0.1048, beats_loss=0.009859, ecapa_loss=0.0001808, whisper_loss=0.09311, over 14894.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01155, ecapa_loss=0.0002266, whisper_loss=0.09433, over 3875436.49 frames. ], batch size: 55, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:10:59,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=772650.0, ans=0.125 2024-08-10 22:11:14,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=772750.0, ans=0.125 2024-08-10 22:11:41,629 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.673e+01 3.089e+01 3.492e+01 5.456e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-10 22:11:45,456 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 22:12:00,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4850, loss[loss=0.1041, beats_loss=0.0116, ecapa_loss=0.0002591, whisper_loss=0.08994, over 19373.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01162, ecapa_loss=0.0002261, whisper_loss=0.09427, over 3880426.92 frames. ], batch size: 82, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:12:21,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=773250.0, ans=0.0 2024-08-10 22:12:41,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=773450.0, ans=0.2 2024-08-10 22:12:49,977 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 22:12:56,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=773550.0, ans=0.125 2024-08-10 22:13:00,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=773550.0, ans=0.125 2024-08-10 22:13:06,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4900, loss[loss=0.132, beats_loss=0.01065, ecapa_loss=0.0002529, whisper_loss=0.1189, over 22754.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01161, ecapa_loss=0.0002248, whisper_loss=0.09447, over 3879362.04 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:13:50,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=773950.0, ans=0.0 2024-08-10 22:13:53,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.882e+01 3.230e+01 4.059e+01 7.454e+01, threshold=6.460e+01, percent-clipped=3.0 2024-08-10 22:14:07,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=774050.0, ans=0.125 2024-08-10 22:14:11,261 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 22:14:12,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 4950, loss[loss=0.1212, beats_loss=0.009954, ecapa_loss=0.0002266, whisper_loss=0.109, over 22672.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01167, ecapa_loss=0.0002252, whisper_loss=0.09436, over 3890921.03 frames. ], batch size: 89, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:14:13,744 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 22:14:43,070 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 22:14:44,367 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 22:14:47,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=774350.0, ans=0.0 2024-08-10 22:14:51,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=774450.0, ans=0.125 2024-08-10 22:14:53,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=774450.0, ans=0.125 2024-08-10 22:14:56,168 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 22:14:56,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=774450.0, ans=0.125 2024-08-10 22:15:18,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5000, loss[loss=0.1258, beats_loss=0.01093, ecapa_loss=0.0002607, whisper_loss=0.1123, over 22553.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01159, ecapa_loss=0.0002262, whisper_loss=0.09468, over 3868507.91 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:15:30,281 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 22:15:34,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774750.0, ans=0.1 2024-08-10 22:15:41,968 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 22:16:00,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=774950.0, ans=0.0 2024-08-10 22:16:04,683 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-10 22:16:07,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.646e+01 2.904e+01 3.171e+01 4.689e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-10 22:16:17,668 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 22:16:29,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=775150.0, ans=0.0 2024-08-10 22:16:30,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5050, loss[loss=0.1145, beats_loss=0.01096, ecapa_loss=0.0002144, whisper_loss=0.1014, over 23460.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01163, ecapa_loss=0.0002257, whisper_loss=0.09484, over 3904161.60 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:16:30,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=775150.0, ans=0.125 2024-08-10 22:16:39,481 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:16:44,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=775250.0, ans=0.0 2024-08-10 22:16:47,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=775250.0, ans=0.0 2024-08-10 22:16:50,102 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 22:16:53,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=775250.0, ans=0.125 2024-08-10 22:16:58,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=775250.0, ans=0.1 2024-08-10 22:16:58,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=12.0 2024-08-10 22:17:06,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=775350.0, ans=0.0 2024-08-10 22:17:10,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775350.0, ans=0.1 2024-08-10 22:17:15,757 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 22:17:38,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=775550.0, ans=0.0 2024-08-10 22:17:47,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5100, loss[loss=0.1273, beats_loss=0.0123, ecapa_loss=0.0001978, whisper_loss=0.113, over 22178.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01157, ecapa_loss=0.0002247, whisper_loss=0.09619, over 3917635.91 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:17:50,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=775650.0, ans=0.125 2024-08-10 22:17:52,395 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 22:17:57,628 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 22:18:15,227 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 29 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 22:18:51,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.763e+01 3.180e+01 3.560e+01 6.035e+01, threshold=6.359e+01, percent-clipped=1.0 2024-08-10 22:18:52,015 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 22:18:57,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=775950.0, ans=0.125 2024-08-10 22:18:59,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=776050.0, ans=0.125 2024-08-10 22:19:16,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5150, loss[loss=0.1212, beats_loss=0.01019, ecapa_loss=0.0001917, whisper_loss=0.1091, over 22294.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01157, ecapa_loss=0.0002242, whisper_loss=0.09604, over 3918641.01 frames. ], batch size: 87, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:19:25,129 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 22:19:38,653 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-10 22:19:52,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=776250.0, ans=0.125 2024-08-10 22:20:23,051 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 22:20:52,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=776550.0, ans=0.1 2024-08-10 22:21:03,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5200, loss[loss=0.09412, beats_loss=0.01389, ecapa_loss=0.0001931, whisper_loss=0.0783, over 22721.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01173, ecapa_loss=0.0002222, whisper_loss=0.09546, over 3932850.85 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:21:09,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=776650.0, ans=0.125 2024-08-10 22:21:43,060 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 22:22:09,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=776950.0, ans=0.0 2024-08-10 22:22:11,477 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.788e+01 3.076e+01 3.692e+01 5.822e+01, threshold=6.152e+01, percent-clipped=0.0 2024-08-10 22:22:42,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5250, loss[loss=0.07949, beats_loss=0.01585, ecapa_loss=0.0001454, whisper_loss=0.06219, over 14643.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01172, ecapa_loss=0.0002205, whisper_loss=0.09482, over 3923311.71 frames. ], batch size: 55, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:22:50,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=777150.0, ans=0.04949747468305833 2024-08-10 22:22:56,925 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 22:23:36,448 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-10 22:23:41,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=777350.0, ans=0.0 2024-08-10 22:23:48,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2024-08-10 22:23:52,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=777450.0, ans=0.0 2024-08-10 22:24:12,630 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 22:24:15,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=777550.0, ans=0.125 2024-08-10 22:24:30,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=777550.0, ans=0.07 2024-08-10 22:24:33,707 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 22:24:38,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5300, loss[loss=0.0957, beats_loss=0.009231, ecapa_loss=0.0002517, whisper_loss=0.08395, over 13656.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01166, ecapa_loss=0.0002199, whisper_loss=0.0946, over 3871329.26 frames. ], batch size: 54, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:24:45,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-10 22:25:12,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-10 22:25:14,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777750.0, ans=0.1 2024-08-10 22:25:27,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=777850.0, ans=0.1 2024-08-10 22:25:40,802 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 22:25:45,562 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 22:26:02,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.845e+01 3.130e+01 3.652e+01 5.218e+01, threshold=6.259e+01, percent-clipped=0.0 2024-08-10 22:26:23,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=778050.0, ans=0.1 2024-08-10 22:26:38,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5350, loss[loss=0.1117, beats_loss=0.00888, ecapa_loss=0.0002062, whisper_loss=0.1008, over 19146.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01157, ecapa_loss=0.00022, whisper_loss=0.09482, over 3882079.99 frames. ], batch size: 73, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:26:43,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=778150.0, ans=0.0 2024-08-10 22:27:19,981 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 22:28:10,024 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 22:28:27,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5400, loss[loss=0.1272, beats_loss=0.01217, ecapa_loss=0.0001982, whisper_loss=0.113, over 15889.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01159, ecapa_loss=0.0002185, whisper_loss=0.09479, over 3871024.64 frames. ], batch size: 59, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:29:16,671 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 22:29:24,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.711e+01 3.100e+01 3.573e+01 5.377e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 22:29:50,692 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5450, loss[loss=0.09528, beats_loss=0.01499, ecapa_loss=0.0001942, whisper_loss=0.07835, over 16921.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01154, ecapa_loss=0.000219, whisper_loss=0.09471, over 3875840.23 frames. ], batch size: 70, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:30:29,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2024-08-10 22:30:50,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=779450.0, ans=0.125 2024-08-10 22:30:57,678 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 22:30:58,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=779450.0, ans=0.04949747468305833 2024-08-10 22:31:16,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779550.0, ans=0.1 2024-08-10 22:31:23,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5500, loss[loss=0.1188, beats_loss=0.01269, ecapa_loss=0.0002562, whisper_loss=0.1036, over 15740.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01157, ecapa_loss=0.0002184, whisper_loss=0.09467, over 3857170.57 frames. ], batch size: 66, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:31:24,256 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 22:31:24,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=779650.0, ans=0.0 2024-08-10 22:31:27,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=779650.0, ans=0.125 2024-08-10 22:31:29,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=779650.0, ans=0.0 2024-08-10 22:31:47,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=779750.0, ans=0.0 2024-08-10 22:31:51,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=779750.0, ans=15.0 2024-08-10 22:32:18,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=779850.0, ans=0.125 2024-08-10 22:32:29,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.640e+01 3.152e+01 3.892e+01 6.209e+01, threshold=6.304e+01, percent-clipped=1.0 2024-08-10 22:32:50,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=780050.0, ans=0.0 2024-08-10 22:32:58,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5550, loss[loss=0.0982, beats_loss=0.01147, ecapa_loss=0.000216, whisper_loss=0.08457, over 20686.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0116, ecapa_loss=0.0002204, whisper_loss=0.09482, over 3871483.72 frames. ], batch size: 82, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:33:21,345 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 22:33:53,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=780450.0, ans=0.125 2024-08-10 22:33:54,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2024-08-10 22:33:56,795 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 22:34:03,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=780450.0, ans=0.025 2024-08-10 22:34:13,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=780550.0, ans=0.2 2024-08-10 22:34:20,578 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 15 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 22:34:32,294 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 22:34:33,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5600, loss[loss=0.08664, beats_loss=0.01352, ecapa_loss=0.0001862, whisper_loss=0.07126, over 22090.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01163, ecapa_loss=0.0002211, whisper_loss=0.09431, over 3890775.47 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:34:40,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=780650.0, ans=0.125 2024-08-10 22:34:52,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=780750.0, ans=0.125 2024-08-10 22:34:52,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=780750.0, ans=0.125 2024-08-10 22:35:02,549 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 18 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 22:35:02,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=780750.0, ans=0.0 2024-08-10 22:35:12,121 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 22:35:21,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=12.0 2024-08-10 22:35:22,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=780850.0, ans=0.125 2024-08-10 22:35:26,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=780950.0, ans=0.125 2024-08-10 22:35:33,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.54 vs. limit=22.5 2024-08-10 22:35:36,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.826e+01 3.158e+01 3.731e+01 5.525e+01, threshold=6.316e+01, percent-clipped=0.0 2024-08-10 22:35:39,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=780950.0, ans=0.125 2024-08-10 22:35:49,342 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 22:35:51,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=781050.0, ans=10.0 2024-08-10 22:35:59,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781050.0, ans=0.1 2024-08-10 22:36:01,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=781050.0, ans=0.125 2024-08-10 22:36:04,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5650, loss[loss=0.1186, beats_loss=0.009953, ecapa_loss=0.0002091, whisper_loss=0.1066, over 23528.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01172, ecapa_loss=0.0002208, whisper_loss=0.09369, over 3929260.94 frames. ], batch size: 93, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:36:09,081 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 22:36:40,760 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 22:36:44,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=781350.0, ans=0.07 2024-08-10 22:36:51,239 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 22:37:14,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=781450.0, ans=0.0 2024-08-10 22:37:19,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.11 vs. limit=10.0 2024-08-10 22:37:35,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5700, loss[loss=0.1171, beats_loss=0.01111, ecapa_loss=0.0002324, whisper_loss=0.1037, over 22134.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01163, ecapa_loss=0.0002213, whisper_loss=0.09429, over 3934822.61 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:37:57,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=781750.0, ans=0.125 2024-08-10 22:38:09,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=781750.0, ans=0.125 2024-08-10 22:38:12,728 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 30 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 22:38:22,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-10 22:38:39,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=781950.0, ans=0.125 2024-08-10 22:38:40,016 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 2.917e+01 3.187e+01 3.836e+01 6.311e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 22:38:43,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=781950.0, ans=0.0 2024-08-10 22:38:43,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=781950.0, ans=0.2 2024-08-10 22:38:45,493 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 30 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 22:38:53,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2024-08-10 22:39:01,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=782050.0, ans=10.0 2024-08-10 22:39:06,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5750, loss[loss=0.08787, beats_loss=0.0123, ecapa_loss=0.0001936, whisper_loss=0.07363, over 20106.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01162, ecapa_loss=0.0002228, whisper_loss=0.09533, over 3957421.73 frames. ], batch size: 80, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:39:26,577 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 22:39:40,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=782250.0, ans=0.0 2024-08-10 22:39:42,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=782350.0, ans=0.5 2024-08-10 22:39:44,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=782350.0, ans=0.125 2024-08-10 22:39:50,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2024-08-10 22:39:51,913 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 22:39:58,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=782350.0, ans=0.0 2024-08-10 22:40:13,541 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 22:40:18,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=782450.0, ans=0.125 2024-08-10 22:40:39,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5800, loss[loss=0.06524, beats_loss=0.01286, ecapa_loss=0.0001844, whisper_loss=0.05053, over 14750.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01165, ecapa_loss=0.000223, whisper_loss=0.09458, over 3935983.10 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:40:48,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=12.0 2024-08-10 22:41:16,285 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 22:41:18,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782850.0, ans=0.1 2024-08-10 22:41:25,852 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 22:41:31,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=782850.0, ans=0.2 2024-08-10 22:41:39,182 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 22:41:44,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.672e+01 3.034e+01 3.531e+01 4.962e+01, threshold=6.068e+01, percent-clipped=0.0 2024-08-10 22:41:55,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=783050.0, ans=0.125 2024-08-10 22:42:01,557 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 22:42:04,215 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 22:42:11,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2024-08-10 22:42:12,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5850, loss[loss=0.1221, beats_loss=0.01237, ecapa_loss=0.00024, whisper_loss=0.1073, over 22642.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01162, ecapa_loss=0.0002216, whisper_loss=0.09486, over 3908945.59 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:42:14,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2024-08-10 22:42:20,621 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 22:42:23,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=783150.0, ans=0.0 2024-08-10 22:42:24,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-10 22:42:28,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=12.0 2024-08-10 22:42:48,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=783350.0, ans=0.125 2024-08-10 22:43:00,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=783350.0, ans=0.125 2024-08-10 22:43:10,470 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 22:43:29,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=783550.0, ans=0.0 2024-08-10 22:43:41,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5900, loss[loss=0.1212, beats_loss=0.01047, ecapa_loss=0.0002154, whisper_loss=0.1086, over 21600.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01167, ecapa_loss=0.0002219, whisper_loss=0.09425, over 3908986.58 frames. ], batch size: 87, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:43:58,303 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 22:43:58,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=783650.0, ans=0.2 2024-08-10 22:44:39,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=783950.0, ans=0.125 2024-08-10 22:44:47,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.739e+01 3.064e+01 3.610e+01 4.850e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-10 22:44:52,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2024-08-10 22:44:58,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784050.0, ans=0.1 2024-08-10 22:44:59,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=784050.0, ans=0.0 2024-08-10 22:45:01,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=784050.0, ans=0.5 2024-08-10 22:45:14,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=784150.0, ans=0.125 2024-08-10 22:45:15,227 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 5950, loss[loss=0.1211, beats_loss=0.0121, ecapa_loss=0.0001971, whisper_loss=0.107, over 23315.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01163, ecapa_loss=0.0002226, whisper_loss=0.09417, over 3899114.80 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:45:19,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=784150.0, ans=0.125 2024-08-10 22:45:33,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=784250.0, ans=0.05 2024-08-10 22:45:41,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2024-08-10 22:45:49,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=12.0 2024-08-10 22:46:24,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=784450.0, ans=0.125 2024-08-10 22:46:35,076 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 22:46:46,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6000, loss[loss=0.1053, beats_loss=0.01283, ecapa_loss=0.0001696, whisper_loss=0.09077, over 23260.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01169, ecapa_loss=0.0002219, whisper_loss=0.09391, over 3876884.35 frames. ], batch size: 88, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:46:46,113 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-10 22:47:25,495 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on ASR_libri: loss=0.2592, beats_loss=0, ecapa_loss=0.0006893, whisper_loss=0.2523, over 922467.00 frames. 2024-08-10 22:47:44,021 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on SV_voxceleb1: loss=0.005715, beats_loss=0, ecapa_loss=0.0005715, whisper_loss=0, over 939242.00 frames. 2024-08-10 22:48:51,380 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.5516, 1.3614, 1.3575, 1.2219, 1.6492, 1.2117, 1.4693, 1.3275], device='cuda:3') 2024-08-10 22:49:35,374 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on AT_audioset: loss=0.02616, beats_loss=0.02616, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 22:49:35,378 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-10 22:49:53,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=784750.0, ans=0.125 2024-08-10 22:50:21,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=784850.0, ans=0.125 2024-08-10 22:50:31,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=784950.0, ans=0.125 2024-08-10 22:50:33,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.532e+01 2.986e+01 3.661e+01 5.128e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-10 22:50:41,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-10 22:50:55,530 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 22:51:00,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6050, loss[loss=0.07996, beats_loss=0.009953, ecapa_loss=0.0002719, whisper_loss=0.06728, over 14426.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01168, ecapa_loss=0.0002219, whisper_loss=0.09392, over 3863529.52 frames. ], batch size: 60, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:51:02,057 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 29 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 22:51:10,994 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 22:51:18,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.73 vs. limit=15.0 2024-08-10 22:51:38,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-08-10 22:52:01,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=12.0 2024-08-10 22:52:04,776 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 22:52:35,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6100, loss[loss=0.1132, beats_loss=0.01302, ecapa_loss=0.0001941, whisper_loss=0.09828, over 23032.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01162, ecapa_loss=0.0002235, whisper_loss=0.09451, over 3882845.16 frames. ], batch size: 91, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:52:53,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=785750.0, ans=0.0 2024-08-10 22:53:12,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=785850.0, ans=0.125 2024-08-10 22:53:26,730 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 22:53:30,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-10 22:53:32,927 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 22:53:36,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.867e+01 3.222e+01 3.705e+01 5.709e+01, threshold=6.445e+01, percent-clipped=0.0 2024-08-10 22:53:38,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-10 22:53:42,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=785950.0, ans=0.0 2024-08-10 22:54:05,757 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6150, loss[loss=0.1013, beats_loss=0.014, ecapa_loss=0.0001597, whisper_loss=0.08572, over 20095.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01162, ecapa_loss=0.0002245, whisper_loss=0.09404, over 3894510.72 frames. ], batch size: 79, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:54:31,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=786250.0, ans=0.125 2024-08-10 22:54:33,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=786250.0, ans=0.125 2024-08-10 22:54:45,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=786350.0, ans=0.0 2024-08-10 22:54:51,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=15.0 2024-08-10 22:54:59,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786450.0, ans=0.1 2024-08-10 22:55:30,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=786550.0, ans=0.125 2024-08-10 22:55:32,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6200, loss[loss=0.1151, beats_loss=0.01265, ecapa_loss=0.0001938, whisper_loss=0.1005, over 17431.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01167, ecapa_loss=0.0002227, whisper_loss=0.09364, over 3902187.83 frames. ], batch size: 68, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:55:41,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2024-08-10 22:56:00,432 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 22:56:14,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=786850.0, ans=0.0 2024-08-10 22:56:14,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=786850.0, ans=0.025 2024-08-10 22:56:18,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=8.0 2024-08-10 22:56:18,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786850.0, ans=0.1 2024-08-10 22:56:28,776 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 22:56:31,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.713e+01 3.021e+01 3.323e+01 5.362e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-10 22:56:37,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-10 22:56:57,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6250, loss[loss=0.1085, beats_loss=0.009542, ecapa_loss=0.0002299, whisper_loss=0.09661, over 21374.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0117, ecapa_loss=0.000223, whisper_loss=0.0935, over 3924104.96 frames. ], batch size: 86, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:57:03,112 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 22:57:14,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787250.0, ans=0.125 2024-08-10 22:57:21,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-08-10 22:57:26,204 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 22:57:44,947 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-10 22:57:53,331 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 22:58:17,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=787550.0, ans=0.125 2024-08-10 22:58:17,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=787550.0, ans=0.2 2024-08-10 22:58:20,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6300, loss[loss=0.09604, beats_loss=0.01244, ecapa_loss=0.0002749, whisper_loss=0.08085, over 13599.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01165, ecapa_loss=0.0002248, whisper_loss=0.0936, over 3898552.76 frames. ], batch size: 57, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:58:39,409 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 22:58:49,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=787750.0, ans=0.07 2024-08-10 22:58:53,563 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:59:19,127 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 22:59:19,398 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.630e-01 2024-08-10 22:59:19,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.845e+01 3.065e+01 3.583e+01 5.394e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-10 22:59:39,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=788050.0, ans=0.0 2024-08-10 22:59:40,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=12.0 2024-08-10 22:59:44,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6350, loss[loss=0.09831, beats_loss=0.01219, ecapa_loss=0.0002487, whisper_loss=0.08363, over 15135.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01162, ecapa_loss=0.0002263, whisper_loss=0.09368, over 3905590.74 frames. ], batch size: 63, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:59:56,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=788150.0, ans=0.125 2024-08-10 22:59:56,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=12.0 2024-08-10 23:00:01,684 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 23:00:14,539 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 23:00:32,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=788450.0, ans=0.0 2024-08-10 23:00:46,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=788450.0, ans=0.0 2024-08-10 23:00:49,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=788550.0, ans=0.0 2024-08-10 23:01:00,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=788550.0, ans=0.125 2024-08-10 23:01:09,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6400, loss[loss=0.1127, beats_loss=0.009909, ecapa_loss=0.0002117, whisper_loss=0.1007, over 19223.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01167, ecapa_loss=0.0002237, whisper_loss=0.0936, over 3923602.21 frames. ], batch size: 75, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:01:21,910 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-10 23:01:22,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=788650.0, ans=0.125 2024-08-10 23:02:00,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=788950.0, ans=0.04949747468305833 2024-08-10 23:02:07,214 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.821e+01 3.135e+01 3.560e+01 4.755e+01, threshold=6.269e+01, percent-clipped=0.0 2024-08-10 23:02:16,593 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 23:02:17,761 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 23:02:19,788 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 23:02:32,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6450, loss[loss=0.1069, beats_loss=0.01062, ecapa_loss=0.0002171, whisper_loss=0.09414, over 17420.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0117, ecapa_loss=0.0002242, whisper_loss=0.09366, over 3909970.58 frames. ], batch size: 68, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:02:55,593 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 23:03:06,426 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 23:03:10,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=789350.0, ans=0.125 2024-08-10 23:03:40,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=789550.0, ans=0.125 2024-08-10 23:03:41,585 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 23:03:54,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6500, loss[loss=0.1284, beats_loss=0.01112, ecapa_loss=0.0002079, whisper_loss=0.1152, over 23516.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01167, ecapa_loss=0.0002233, whisper_loss=0.09423, over 3953679.98 frames. ], batch size: 93, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:04:20,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=789750.0, ans=0.125 2024-08-10 23:04:47,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-08-10 23:04:52,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=789950.0, ans=0.04949747468305833 2024-08-10 23:04:55,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.899e+01 3.223e+01 3.887e+01 5.763e+01, threshold=6.447e+01, percent-clipped=0.0 2024-08-10 23:04:59,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.98 vs. limit=22.5 2024-08-10 23:05:06,631 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 23:05:19,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6550, loss[loss=0.09584, beats_loss=0.01077, ecapa_loss=0.0002408, whisper_loss=0.08267, over 18236.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01158, ecapa_loss=0.0002227, whisper_loss=0.09543, over 3948292.62 frames. ], batch size: 74, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:05:20,582 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 23:05:23,155 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 23:05:31,758 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 23:05:41,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=790250.0, ans=0.2 2024-08-10 23:05:57,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=790350.0, ans=0.0 2024-08-10 23:05:59,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=790350.0, ans=0.0 2024-08-10 23:06:07,175 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 23:06:19,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790450.0, ans=0.125 2024-08-10 23:06:35,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=790550.0, ans=0.05 2024-08-10 23:06:39,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=790550.0, ans=0.125 2024-08-10 23:06:42,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6600, loss[loss=0.1178, beats_loss=0.01045, ecapa_loss=0.0002462, whisper_loss=0.1049, over 21832.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01146, ecapa_loss=0.0002247, whisper_loss=0.09556, over 3973187.60 frames. ], batch size: 86, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:06:51,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=790650.0, ans=0.125 2024-08-10 23:07:38,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.957e+01 3.211e+01 3.827e+01 6.878e+01, threshold=6.422e+01, percent-clipped=2.0 2024-08-10 23:07:51,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791050.0, ans=0.1 2024-08-10 23:08:02,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6650, loss[loss=0.08467, beats_loss=0.01405, ecapa_loss=0.0001825, whisper_loss=0.06879, over 20394.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01145, ecapa_loss=0.0002243, whisper_loss=0.09585, over 3945790.94 frames. ], batch size: 85, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:08:22,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-08-10 23:08:49,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=791350.0, ans=0.0 2024-08-10 23:09:11,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791550.0, ans=0.1 2024-08-10 23:09:19,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=791550.0, ans=15.0 2024-08-10 23:09:21,929 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 23:09:22,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6700, loss[loss=0.1265, beats_loss=0.008872, ecapa_loss=0.0002886, whisper_loss=0.1147, over 16296.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0115, ecapa_loss=0.0002241, whisper_loss=0.09538, over 3911460.28 frames. ], batch size: 67, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:09:38,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.64 vs. limit=15.0 2024-08-10 23:09:43,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=791750.0, ans=0.0 2024-08-10 23:09:43,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=791750.0, ans=0.0 2024-08-10 23:09:47,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=791750.0, ans=0.125 2024-08-10 23:09:54,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-10 23:10:05,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=791850.0, ans=0.2 2024-08-10 23:10:07,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.13 vs. limit=10.0 2024-08-10 23:10:15,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.716e+01 3.104e+01 3.606e+01 5.024e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-10 23:10:23,461 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 23:10:26,293 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 23:10:33,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2024-08-10 23:10:35,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792050.0, ans=0.1 2024-08-10 23:10:37,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6750, loss[loss=0.1062, beats_loss=0.0133, ecapa_loss=0.0002143, whisper_loss=0.09079, over 21621.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01153, ecapa_loss=0.0002254, whisper_loss=0.09484, over 3888439.85 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:10:39,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=792150.0, ans=0.125 2024-08-10 23:11:02,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2024-08-10 23:11:15,571 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 23:11:31,760 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 23:11:33,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=792450.0, ans=0.2 2024-08-10 23:11:35,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2024-08-10 23:11:54,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6800, loss[loss=0.09788, beats_loss=0.01383, ecapa_loss=0.0002616, whisper_loss=0.08144, over 21820.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01164, ecapa_loss=0.0002252, whisper_loss=0.09434, over 3895561.82 frames. ], batch size: 92, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:12:02,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=792650.0, ans=0.0 2024-08-10 23:12:31,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=792850.0, ans=0.2 2024-08-10 23:12:34,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=792850.0, ans=0.95 2024-08-10 23:12:43,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=792950.0, ans=0.125 2024-08-10 23:12:46,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.823e+01 3.224e+01 3.746e+01 6.225e+01, threshold=6.449e+01, percent-clipped=1.0 2024-08-10 23:13:10,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6850, loss[loss=0.07948, beats_loss=0.01268, ecapa_loss=0.0001665, whisper_loss=0.06514, over 15327.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01156, ecapa_loss=0.0002248, whisper_loss=0.09419, over 3852341.32 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:13:17,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=793150.0, ans=0.125 2024-08-10 23:13:30,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=22.5 2024-08-10 23:13:32,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=793250.0, ans=0.125 2024-08-10 23:13:36,584 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 23:13:44,692 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 23:13:56,620 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 23:14:07,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=793450.0, ans=0.125 2024-08-10 23:14:11,500 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 23:14:11,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=793550.0, ans=0.125 2024-08-10 23:14:20,914 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 23:14:28,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6900, loss[loss=0.1088, beats_loss=0.01162, ecapa_loss=0.0001867, whisper_loss=0.09532, over 18848.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01151, ecapa_loss=0.0002242, whisper_loss=0.09497, over 3857549.56 frames. ], batch size: 74, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:14:28,575 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 23:14:31,794 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:14:54,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2024-08-10 23:15:07,565 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 23:15:20,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.839e+01 3.157e+01 3.612e+01 7.302e+01, threshold=6.314e+01, percent-clipped=1.0 2024-08-10 23:15:42,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 6950, loss[loss=0.1001, beats_loss=0.01026, ecapa_loss=0.0002306, whisper_loss=0.08757, over 17642.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01159, ecapa_loss=0.0002221, whisper_loss=0.09492, over 3898307.30 frames. ], batch size: 70, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:15:47,274 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.047e+00 2024-08-10 23:16:08,916 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.145e-02 2024-08-10 23:16:19,310 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 18 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 23:16:29,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794450.0, ans=0.125 2024-08-10 23:16:41,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=794550.0, ans=0.125 2024-08-10 23:16:54,603 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 23:16:54,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=794550.0, ans=0.2 2024-08-10 23:16:57,098 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7000, loss[loss=0.1018, beats_loss=0.01171, ecapa_loss=0.0002298, whisper_loss=0.08777, over 18129.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01157, ecapa_loss=0.0002223, whisper_loss=0.09492, over 3897490.83 frames. ], batch size: 75, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:17:18,338 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 23:17:23,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=794750.0, ans=0.125 2024-08-10 23:17:31,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=794850.0, ans=0.125 2024-08-10 23:17:36,843 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 23:17:49,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.681e+01 2.983e+01 3.369e+01 6.385e+01, threshold=5.967e+01, percent-clipped=1.0 2024-08-10 23:17:56,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=795050.0, ans=0.0 2024-08-10 23:17:58,547 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 23:18:02,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.23 vs. limit=10.0 2024-08-10 23:18:12,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7050, loss[loss=0.1245, beats_loss=0.01129, ecapa_loss=0.0001928, whisper_loss=0.1112, over 21683.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01152, ecapa_loss=0.0002223, whisper_loss=0.09504, over 3884950.42 frames. ], batch size: 87, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:18:13,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=795150.0, ans=0.125 2024-08-10 23:18:19,131 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 23:18:22,041 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 23:18:36,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=795250.0, ans=0.0 2024-08-10 23:19:00,962 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 23:19:14,151 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 23:19:28,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=795650.0, ans=0.125 2024-08-10 23:19:28,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7100, loss[loss=0.101, beats_loss=0.01297, ecapa_loss=0.0001964, whisper_loss=0.08608, over 22673.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01156, ecapa_loss=0.0002196, whisper_loss=0.09479, over 3906509.74 frames. ], batch size: 95, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:19:37,464 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 23:19:50,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795750.0, ans=0.1 2024-08-10 23:19:52,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=795750.0, ans=0.125 2024-08-10 23:19:56,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795750.0, ans=0.1 2024-08-10 23:20:04,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=795850.0, ans=0.1 2024-08-10 23:20:22,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.587e+01 2.924e+01 3.368e+01 5.025e+01, threshold=5.848e+01, percent-clipped=0.0 2024-08-10 23:20:26,586 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 23:20:35,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=12.0 2024-08-10 23:20:37,922 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-10 23:20:41,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=796050.0, ans=0.07 2024-08-10 23:20:46,505 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7150, loss[loss=0.08926, beats_loss=0.01169, ecapa_loss=0.0002531, whisper_loss=0.07504, over 17679.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01162, ecapa_loss=0.0002189, whisper_loss=0.09501, over 3924572.63 frames. ], batch size: 72, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:20:50,847 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 23:21:18,745 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 23:21:19,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=796350.0, ans=0.125 2024-08-10 23:21:24,718 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 23:21:39,079 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 23:21:42,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=796450.0, ans=0.2 2024-08-10 23:21:44,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=796450.0, ans=0.125 2024-08-10 23:22:00,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7200, loss[loss=0.08832, beats_loss=0.01329, ecapa_loss=0.0002418, whisper_loss=0.07261, over 21546.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01169, ecapa_loss=0.0002207, whisper_loss=0.09474, over 3956340.24 frames. ], batch size: 94, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:22:11,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=796650.0, ans=0.09899494936611666 2024-08-10 23:22:16,116 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 23:22:28,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.37 vs. limit=10.0 2024-08-10 23:22:32,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2024-08-10 23:22:39,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.0 2024-08-10 23:22:39,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-10 23:22:44,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=796850.0, ans=0.2 2024-08-10 23:22:48,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=796950.0, ans=0.5 2024-08-10 23:22:54,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.940e+01 3.360e+01 3.850e+01 6.660e+01, threshold=6.719e+01, percent-clipped=3.0 2024-08-10 23:23:01,793 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 23:23:06,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-10 23:23:17,511 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7250, loss[loss=0.1021, beats_loss=0.01296, ecapa_loss=0.0002196, whisper_loss=0.08696, over 20393.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01162, ecapa_loss=0.0002223, whisper_loss=0.09477, over 3956312.39 frames. ], batch size: 86, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:23:29,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.80 vs. limit=22.5 2024-08-10 23:23:37,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=797250.0, ans=0.0 2024-08-10 23:23:47,263 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 23:23:59,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2024-08-10 23:24:02,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797450.0, ans=0.1 2024-08-10 23:24:04,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=797450.0, ans=0.125 2024-08-10 23:24:20,198 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-10 23:24:29,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797550.0, ans=0.1 2024-08-10 23:24:31,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7300, loss[loss=0.09853, beats_loss=0.01303, ecapa_loss=0.0001823, whisper_loss=0.08368, over 17627.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01157, ecapa_loss=0.0002221, whisper_loss=0.09576, over 3918677.07 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:24:31,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=797650.0, ans=0.0 2024-08-10 23:24:31,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=797650.0, ans=0.125 2024-08-10 23:24:44,259 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 23:24:56,636 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 23:25:00,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=797850.0, ans=0.125 2024-08-10 23:25:07,640 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 23:25:09,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=797850.0, ans=0.125 2024-08-10 23:25:18,666 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 23:25:22,588 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.736e+01 3.135e+01 3.639e+01 8.330e+01, threshold=6.270e+01, percent-clipped=2.0 2024-08-10 23:25:26,878 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 23:25:43,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7350, loss[loss=0.1046, beats_loss=0.01218, ecapa_loss=0.0001925, whisper_loss=0.09049, over 17621.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01164, ecapa_loss=0.0002215, whisper_loss=0.0947, over 3893970.70 frames. ], batch size: 70, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:25:47,909 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 23:26:04,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=798250.0, ans=0.125 2024-08-10 23:26:09,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=798250.0, ans=0.2 2024-08-10 23:26:29,101 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 23:26:32,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=798450.0, ans=0.0 2024-08-10 23:26:37,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=798450.0, ans=0.0 2024-08-10 23:26:48,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=798550.0, ans=0.125 2024-08-10 23:26:54,033 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7400, loss[loss=0.1068, beats_loss=0.01089, ecapa_loss=0.0002007, whisper_loss=0.09387, over 14778.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01162, ecapa_loss=0.0002208, whisper_loss=0.09477, over 3881075.98 frames. ], batch size: 55, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:26:58,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-10 23:27:06,831 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 23:27:20,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=798850.0, ans=0.1 2024-08-10 23:27:31,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=798850.0, ans=15.0 2024-08-10 23:27:35,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2024-08-10 23:27:42,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.647e+01 3.053e+01 3.534e+01 7.826e+01, threshold=6.106e+01, percent-clipped=2.0 2024-08-10 23:27:43,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-10 23:27:52,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=799050.0, ans=0.125 2024-08-10 23:27:55,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=799050.0, ans=0.125 2024-08-10 23:27:57,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=799050.0, ans=0.125 2024-08-10 23:28:04,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7450, loss[loss=0.1142, beats_loss=0.01313, ecapa_loss=0.0001676, whisper_loss=0.09935, over 15845.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01168, ecapa_loss=0.0002201, whisper_loss=0.09447, over 3882513.42 frames. ], batch size: 62, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:28:22,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=799250.0, ans=0.0 2024-08-10 23:28:43,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=799350.0, ans=0.125 2024-08-10 23:28:54,938 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 23:29:10,365 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 23:29:10,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=799550.0, ans=0.2 2024-08-10 23:29:12,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.61 vs. limit=10.0 2024-08-10 23:29:12,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7500, loss[loss=0.1103, beats_loss=0.01102, ecapa_loss=0.0002246, whisper_loss=0.09703, over 21443.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01165, ecapa_loss=0.0002201, whisper_loss=0.09466, over 3873682.55 frames. ], batch size: 87, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:29:14,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=799650.0, ans=0.125 2024-08-10 23:29:36,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.81 vs. limit=10.0 2024-08-10 23:29:45,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2024-08-10 23:29:47,962 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 23:29:49,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=799850.0, ans=0.0 2024-08-10 23:29:50,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=799850.0, ans=0.2 2024-08-10 23:29:54,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=799950.0, ans=0.0 2024-08-10 23:29:58,162 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 23:30:01,239 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 23:30:04,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.160e+01 2.761e+01 3.186e+01 3.767e+01 5.987e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 23:30:10,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=800050.0, ans=15.0 2024-08-10 23:30:24,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-08-10 23:30:25,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7550, loss[loss=0.1235, beats_loss=0.009551, ecapa_loss=0.0002528, whisper_loss=0.1114, over 18486.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01161, ecapa_loss=0.0002228, whisper_loss=0.09482, over 3887069.79 frames. ], batch size: 73, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:30:27,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=800150.0, ans=0.125 2024-08-10 23:30:34,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=800150.0, ans=0.025 2024-08-10 23:30:48,661 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 23:30:56,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=800350.0, ans=0.125 2024-08-10 23:30:57,405 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 23:31:02,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=800350.0, ans=0.125 2024-08-10 23:31:11,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=800450.0, ans=0.07 2024-08-10 23:31:12,894 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 23:31:19,212 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 23:31:21,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-08-10 23:31:38,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7600, loss[loss=0.1039, beats_loss=0.009863, ecapa_loss=0.000253, whisper_loss=0.09155, over 15804.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01157, ecapa_loss=0.0002225, whisper_loss=0.09433, over 3858037.49 frames. ], batch size: 64, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:31:40,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=800650.0, ans=0.125 2024-08-10 23:31:41,794 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 23:32:00,505 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 23:32:04,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2024-08-10 23:32:12,558 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 23:32:19,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=800850.0, ans=0.0 2024-08-10 23:32:25,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=800950.0, ans=0.125 2024-08-10 23:32:29,149 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 23:32:30,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.785e+01 3.066e+01 3.767e+01 8.128e+01, threshold=6.132e+01, percent-clipped=1.0 2024-08-10 23:32:35,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=800950.0, ans=0.125 2024-08-10 23:32:40,773 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 23:32:43,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=801050.0, ans=0.125 2024-08-10 23:32:44,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=801050.0, ans=10.0 2024-08-10 23:32:51,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7650, loss[loss=0.1052, beats_loss=0.01002, ecapa_loss=0.0002394, whisper_loss=0.09281, over 18079.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01148, ecapa_loss=0.0002228, whisper_loss=0.09375, over 3863265.02 frames. ], batch size: 70, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:33:10,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=15.0 2024-08-10 23:33:16,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-08-10 23:33:33,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=801450.0, ans=0.0 2024-08-10 23:33:35,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=801450.0, ans=0.125 2024-08-10 23:33:37,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.98 vs. limit=22.5 2024-08-10 23:33:39,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=801450.0, ans=0.125 2024-08-10 23:34:01,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7700, loss[loss=0.1245, beats_loss=0.009253, ecapa_loss=0.0001891, whisper_loss=0.1134, over 20464.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01152, ecapa_loss=0.0002212, whisper_loss=0.0934, over 3878228.75 frames. ], batch size: 72, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:34:05,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=801650.0, ans=0.05 2024-08-10 23:34:11,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=801650.0, ans=0.09899494936611666 2024-08-10 23:34:27,763 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 23:34:36,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.19 vs. limit=15.0 2024-08-10 23:34:46,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=801950.0, ans=0.125 2024-08-10 23:34:50,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.824e+01 3.342e+01 3.789e+01 5.468e+01, threshold=6.684e+01, percent-clipped=0.0 2024-08-10 23:34:57,813 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 23:35:05,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-10 23:35:10,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-08-10 23:35:11,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7750, loss[loss=0.1033, beats_loss=0.01172, ecapa_loss=0.0002296, whisper_loss=0.08932, over 15274.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01161, ecapa_loss=0.0002195, whisper_loss=0.09298, over 3874332.13 frames. ], batch size: 65, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:35:11,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=802150.0, ans=0.125 2024-08-10 23:35:14,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=802150.0, ans=0.0 2024-08-10 23:35:39,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-08-10 23:35:40,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=802350.0, ans=0.125 2024-08-10 23:36:25,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7800, loss[loss=0.09395, beats_loss=0.0134, ecapa_loss=0.0002507, whisper_loss=0.07805, over 21535.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01157, ecapa_loss=0.0002202, whisper_loss=0.09412, over 3913884.17 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:36:26,592 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-10 23:36:26,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=802650.0, ans=0.2 2024-08-10 23:36:28,507 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:36:36,856 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 23:36:39,273 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 23:36:43,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=802750.0, ans=0.125 2024-08-10 23:37:01,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=802850.0, ans=0.125 2024-08-10 23:37:14,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.895e+01 3.316e+01 3.988e+01 7.505e+01, threshold=6.631e+01, percent-clipped=2.0 2024-08-10 23:37:35,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7850, loss[loss=0.1091, beats_loss=0.008744, ecapa_loss=0.0002424, whisper_loss=0.09792, over 14680.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01153, ecapa_loss=0.0002208, whisper_loss=0.0945, over 3879680.67 frames. ], batch size: 56, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:37:44,816 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 23:37:51,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=803250.0, ans=0.1 2024-08-10 23:37:58,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=803250.0, ans=0.125 2024-08-10 23:37:59,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=803250.0, ans=0.125 2024-08-10 23:38:03,885 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 23:38:08,857 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.356e-03 2024-08-10 23:38:20,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=803450.0, ans=0.125 2024-08-10 23:38:38,249 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 23:38:47,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7900, loss[loss=0.1216, beats_loss=0.008599, ecapa_loss=0.0002318, whisper_loss=0.1107, over 15210.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01167, ecapa_loss=0.000219, whisper_loss=0.09418, over 3874842.35 frames. ], batch size: 61, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:38:52,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2024-08-10 23:38:55,175 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:39:07,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=803750.0, ans=0.2 2024-08-10 23:39:15,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=803850.0, ans=0.125 2024-08-10 23:39:16,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=803850.0, ans=0.125 2024-08-10 23:39:25,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=803850.0, ans=0.0 2024-08-10 23:39:33,970 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 23:39:37,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.898e+01 3.197e+01 3.826e+01 5.899e+01, threshold=6.393e+01, percent-clipped=0.0 2024-08-10 23:39:44,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2024-08-10 23:39:48,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2024-08-10 23:39:50,494 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 23:39:53,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.85 vs. limit=22.5 2024-08-10 23:39:58,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 7950, loss[loss=0.1245, beats_loss=0.01074, ecapa_loss=0.0001994, whisper_loss=0.1118, over 21177.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01162, ecapa_loss=0.0002189, whisper_loss=0.09466, over 3860598.64 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:40:02,948 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 23:40:07,153 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.711e-02 2024-08-10 23:40:21,887 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 28 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-10 23:40:22,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=804250.0, ans=0.2 2024-08-10 23:40:33,809 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 23:40:37,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=804450.0, ans=0.2 2024-08-10 23:40:42,793 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 23:41:05,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8000, loss[loss=0.1101, beats_loss=0.01213, ecapa_loss=0.000216, whisper_loss=0.0958, over 15391.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01164, ecapa_loss=0.000217, whisper_loss=0.0948, over 3881656.83 frames. ], batch size: 62, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:41:12,364 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 23:41:24,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=804750.0, ans=0.1 2024-08-10 23:41:30,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=804750.0, ans=0.125 2024-08-10 23:41:37,746 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 23:41:46,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=804950.0, ans=0.125 2024-08-10 23:41:51,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.694e+01 3.160e+01 3.631e+01 6.005e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 23:42:11,671 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8050, loss[loss=0.1216, beats_loss=0.009568, ecapa_loss=0.0001926, whisper_loss=0.1101, over 18607.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0116, ecapa_loss=0.0002177, whisper_loss=0.09548, over 3910378.95 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:42:18,210 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 23:42:29,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=805250.0, ans=0.2 2024-08-10 23:42:36,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805250.0, ans=0.1 2024-08-10 23:42:49,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.52 vs. limit=10.0 2024-08-10 23:43:05,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-10 23:43:17,159 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 23:43:18,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8100, loss[loss=0.1204, beats_loss=0.009739, ecapa_loss=0.0002845, whisper_loss=0.1078, over 21517.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01158, ecapa_loss=0.0002192, whisper_loss=0.09531, over 3907277.70 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:43:20,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=805650.0, ans=0.125 2024-08-10 23:43:23,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-10 23:43:25,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=805650.0, ans=0.125 2024-08-10 23:43:28,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=805650.0, ans=0.2 2024-08-10 23:43:47,041 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 23:43:53,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=805850.0, ans=0.125 2024-08-10 23:43:54,054 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.569e-02 2024-08-10 23:44:04,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.753e+01 3.174e+01 3.635e+01 5.123e+01, threshold=6.349e+01, percent-clipped=0.0 2024-08-10 23:44:13,773 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 23:44:22,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=806050.0, ans=0.125 2024-08-10 23:44:24,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8150, loss[loss=0.08056, beats_loss=0.01209, ecapa_loss=0.0002696, whisper_loss=0.06577, over 14152.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01161, ecapa_loss=0.0002187, whisper_loss=0.0947, over 3930996.06 frames. ], batch size: 60, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:44:29,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.15 vs. limit=22.5 2024-08-10 23:44:39,088 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 23:44:45,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=806250.0, ans=0.0 2024-08-10 23:45:00,159 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 23:45:30,518 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8200, loss[loss=0.1018, beats_loss=0.01234, ecapa_loss=0.0002336, whisper_loss=0.0871, over 22085.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01166, ecapa_loss=0.0002184, whisper_loss=0.09395, over 3923648.15 frames. ], batch size: 91, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:45:34,516 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-10 23:45:45,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=806750.0, ans=0.125 2024-08-10 23:45:46,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=806750.0, ans=0.125 2024-08-10 23:45:47,049 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:45:48,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=806750.0, ans=0.0 2024-08-10 23:45:51,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2024-08-10 23:45:59,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=806850.0, ans=0.125 2024-08-10 23:45:59,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=806850.0, ans=0.025 2024-08-10 23:46:04,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-08-10 23:46:10,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=806950.0, ans=0.125 2024-08-10 23:46:10,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=806950.0, ans=0.125 2024-08-10 23:46:16,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.784e+01 3.124e+01 3.627e+01 5.044e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-10 23:46:16,473 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 23:46:25,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=807050.0, ans=0.0 2024-08-10 23:46:31,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2024-08-10 23:46:34,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=807050.0, ans=0.2 2024-08-10 23:46:36,063 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8250, loss[loss=0.1167, beats_loss=0.009316, ecapa_loss=0.0002973, whisper_loss=0.1044, over 15194.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01161, ecapa_loss=0.0002198, whisper_loss=0.09463, over 3929384.57 frames. ], batch size: 65, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:46:54,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2024-08-10 23:46:54,728 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 23:47:16,054 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-10 23:47:18,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=807450.0, ans=0.125 2024-08-10 23:47:19,842 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 23:47:29,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2024-08-10 23:47:32,052 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 23:47:36,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=807550.0, ans=0.0 2024-08-10 23:47:42,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8300, loss[loss=0.114, beats_loss=0.01091, ecapa_loss=0.0002256, whisper_loss=0.1008, over 24029.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01164, ecapa_loss=0.0002192, whisper_loss=0.09439, over 3920387.27 frames. ], batch size: 91, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:47:52,030 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 23:48:19,831 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 23:48:23,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=807950.0, ans=0.125 2024-08-10 23:48:29,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.802e+01 3.186e+01 3.664e+01 3.254e+02, threshold=6.372e+01, percent-clipped=4.0 2024-08-10 23:48:30,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-08-10 23:48:43,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=808050.0, ans=0.0 2024-08-10 23:48:46,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=808050.0, ans=0.125 2024-08-10 23:48:48,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8350, loss[loss=0.1027, beats_loss=0.01206, ecapa_loss=0.0001762, whisper_loss=0.08891, over 17343.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01165, ecapa_loss=0.0002191, whisper_loss=0.09408, over 3920776.39 frames. ], batch size: 67, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:48:59,832 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 23:49:36,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-08-10 23:49:37,964 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 23:49:53,581 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8400, loss[loss=0.0991, beats_loss=0.0141, ecapa_loss=0.0001834, whisper_loss=0.08316, over 20287.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01164, ecapa_loss=0.0002188, whisper_loss=0.09392, over 3921586.99 frames. ], batch size: 83, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:50:00,208 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 23:50:24,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=808850.0, ans=0.0 2024-08-10 23:50:39,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.636e+01 3.039e+01 3.423e+01 5.250e+01, threshold=6.078e+01, percent-clipped=0.0 2024-08-10 23:50:55,370 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 23:50:58,411 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 40 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 23:50:59,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8450, loss[loss=0.1398, beats_loss=0.01045, ecapa_loss=0.0002187, whisper_loss=0.1272, over 23201.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01163, ecapa_loss=0.0002208, whisper_loss=0.09414, over 3924216.16 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:51:22,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=809250.0, ans=0.125 2024-08-10 23:51:44,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=809450.0, ans=0.015 2024-08-10 23:52:00,939 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 23:52:02,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2024-08-10 23:52:06,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8500, loss[loss=0.102, beats_loss=0.01095, ecapa_loss=0.0001997, whisper_loss=0.08908, over 21713.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01159, ecapa_loss=0.00022, whisper_loss=0.09426, over 3924324.77 frames. ], batch size: 84, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:52:17,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=809650.0, ans=0.0 2024-08-10 23:52:33,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=809750.0, ans=0.2 2024-08-10 23:52:44,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=809850.0, ans=0.125 2024-08-10 23:52:59,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.787e+01 3.102e+01 3.651e+01 5.135e+01, threshold=6.204e+01, percent-clipped=0.0 2024-08-10 23:53:08,717 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 23:53:21,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=810150.0, ans=0.0 2024-08-10 23:53:21,960 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8550, loss[loss=0.1161, beats_loss=0.01091, ecapa_loss=0.0002178, whisper_loss=0.103, over 17004.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01147, ecapa_loss=0.0002211, whisper_loss=0.09554, over 3931316.82 frames. ], batch size: 66, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:53:39,019 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 10 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 23:53:46,187 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 23:53:53,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=810350.0, ans=0.0 2024-08-10 23:53:53,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=810350.0, ans=0.2 2024-08-10 23:53:56,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=810350.0, ans=0.125 2024-08-10 23:53:58,087 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 23:54:04,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=810450.0, ans=0.0 2024-08-10 23:54:09,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=810450.0, ans=0.2 2024-08-10 23:54:13,617 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 23:54:13,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=810450.0, ans=0.125 2024-08-10 23:54:19,822 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 23:54:22,456 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-10 23:54:28,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=810550.0, ans=0.07 2024-08-10 23:54:32,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810550.0, ans=0.125 2024-08-10 23:54:34,506 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8600, loss[loss=0.09563, beats_loss=0.01406, ecapa_loss=0.0001543, whisper_loss=0.08002, over 15270.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01152, ecapa_loss=0.0002188, whisper_loss=0.09559, over 3885651.85 frames. ], batch size: 57, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:54:51,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=810750.0, ans=0.2 2024-08-10 23:54:53,421 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 23:54:56,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=810750.0, ans=0.125 2024-08-10 23:55:03,178 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 23:55:04,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=810850.0, ans=0.125 2024-08-10 23:55:08,684 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 23:55:10,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=810850.0, ans=0.015 2024-08-10 23:55:21,543 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 23:55:22,600 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 23:55:23,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.846e+01 3.382e+01 3.840e+01 6.128e+01, threshold=6.764e+01, percent-clipped=0.0 2024-08-10 23:55:34,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=811050.0, ans=0.2 2024-08-10 23:55:44,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8650, loss[loss=0.1005, beats_loss=0.01254, ecapa_loss=0.000156, whisper_loss=0.08639, over 15499.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01156, ecapa_loss=0.0002193, whisper_loss=0.09588, over 3913203.94 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:56:00,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=12.0 2024-08-10 23:56:16,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.48 vs. limit=10.0 2024-08-10 23:56:27,182 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 23:56:55,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8700, loss[loss=0.1084, beats_loss=0.008682, ecapa_loss=0.0002504, whisper_loss=0.09717, over 17185.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01158, ecapa_loss=0.0002182, whisper_loss=0.09509, over 3946194.13 frames. ], batch size: 72, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:57:01,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=12.0 2024-08-10 23:57:26,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=811850.0, ans=0.2 2024-08-10 23:57:43,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.694e+01 2.974e+01 3.412e+01 6.571e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-10 23:57:54,489 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 23:57:57,720 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 23:58:04,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8750, loss[loss=0.08171, beats_loss=0.01303, ecapa_loss=0.0002549, whisper_loss=0.06612, over 13838.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01157, ecapa_loss=0.0002185, whisper_loss=0.09459, over 3882927.70 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:58:05,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=812150.0, ans=22.5 2024-08-10 23:58:22,201 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 23:58:25,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=812250.0, ans=0.0 2024-08-10 23:58:26,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812250.0, ans=0.1 2024-08-10 23:58:49,506 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 23:59:12,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8800, loss[loss=0.1046, beats_loss=0.01494, ecapa_loss=0.0001914, whisper_loss=0.08775, over 20617.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01153, ecapa_loss=0.0002183, whisper_loss=0.09592, over 3904141.98 frames. ], batch size: 78, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:59:14,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2024-08-10 23:59:22,241 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 23:59:23,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=812650.0, ans=0.125 2024-08-10 23:59:24,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=12.0 2024-08-10 23:59:33,085 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 23:59:38,527 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 23:59:39,737 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 23:59:44,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=812850.0, ans=0.0 2024-08-10 23:59:58,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2024-08-10 23:59:59,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.902e+01 3.394e+01 3.776e+01 5.499e+01, threshold=6.788e+01, percent-clipped=0.0 2024-08-11 00:00:06,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-11 00:00:07,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=813050.0, ans=0.125 2024-08-11 00:00:21,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8850, loss[loss=0.1059, beats_loss=0.01318, ecapa_loss=0.0001827, whisper_loss=0.09091, over 18600.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01153, ecapa_loss=0.000219, whisper_loss=0.09546, over 3879021.07 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:00:24,411 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 19 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-11 00:00:33,070 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 00:00:34,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=813250.0, ans=0.1 2024-08-11 00:00:40,923 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 00:00:43,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813250.0, ans=0.1 2024-08-11 00:01:23,094 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 29 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 00:01:27,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.64 vs. limit=22.5 2024-08-11 00:01:30,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8900, loss[loss=0.1157, beats_loss=0.01083, ecapa_loss=0.0002341, whisper_loss=0.1025, over 22191.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01158, ecapa_loss=0.0002173, whisper_loss=0.09532, over 3851178.57 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:01:44,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.13 vs. limit=10.0 2024-08-11 00:01:53,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-08-11 00:02:05,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=813850.0, ans=0.1 2024-08-11 00:02:10,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=813950.0, ans=0.125 2024-08-11 00:02:17,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=813950.0, ans=0.0 2024-08-11 00:02:18,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.661e+01 2.983e+01 3.454e+01 5.391e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 00:02:20,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=813950.0, ans=0.125 2024-08-11 00:02:22,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=813950.0, ans=0.125 2024-08-11 00:02:28,931 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 00:02:37,941 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 8950, loss[loss=0.09401, beats_loss=0.01225, ecapa_loss=0.0002467, whisper_loss=0.07929, over 21811.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01166, ecapa_loss=0.0002165, whisper_loss=0.09564, over 3879775.98 frames. ], batch size: 93, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:02:38,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=814150.0, ans=0.125 2024-08-11 00:02:47,455 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 00:02:50,134 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 00:03:02,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=814250.0, ans=0.125 2024-08-11 00:03:03,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=814350.0, ans=0.0 2024-08-11 00:03:13,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=814350.0, ans=0.0 2024-08-11 00:03:14,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=814350.0, ans=0.125 2024-08-11 00:03:27,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=814450.0, ans=0.125 2024-08-11 00:03:28,261 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 00:03:30,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=814550.0, ans=0.125 2024-08-11 00:03:36,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814550.0, ans=0.1 2024-08-11 00:03:40,288 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-11 00:03:44,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9000, loss[loss=0.1067, beats_loss=0.01364, ecapa_loss=0.0002002, whisper_loss=0.09105, over 23301.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01171, ecapa_loss=0.0002171, whisper_loss=0.09497, over 3886536.45 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:03:44,139 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 00:04:24,170 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0006942, whisper_loss=0.2529, over 922467.00 frames. 2024-08-11 00:04:40,368 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8875, 2.4059, 2.3843, 1.5563, 1.4302, 2.0191, 2.5146, 2.4576], device='cuda:3') 2024-08-11 00:04:43,467 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 00:06:09,228 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6297, 3.1028, 3.4333, 3.2379], device='cuda:3') 2024-08-11 00:06:37,863 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on AT_audioset: loss=0.02592, beats_loss=0.02592, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 00:06:37,867 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 00:06:57,809 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 00:07:05,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=814750.0, ans=0.0 2024-08-11 00:07:09,553 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 00:07:29,531 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 00:07:30,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.866e+01 3.382e+01 4.145e+01 7.682e+01, threshold=6.764e+01, percent-clipped=3.0 2024-08-11 00:07:35,983 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 00:07:43,308 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 00:07:43,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815050.0, ans=0.1 2024-08-11 00:07:51,180 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 00:07:51,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=815050.0, ans=0.2 2024-08-11 00:07:54,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9050, loss[loss=0.113, beats_loss=0.01195, ecapa_loss=0.0001869, whisper_loss=0.09917, over 20731.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01152, ecapa_loss=0.0002188, whisper_loss=0.09567, over 3903269.44 frames. ], batch size: 80, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:07:55,178 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 00:07:56,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=815150.0, ans=0.05 2024-08-11 00:08:02,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-11 00:08:02,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=815150.0, ans=0.015 2024-08-11 00:08:38,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=815450.0, ans=0.2 2024-08-11 00:08:39,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-11 00:08:40,404 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 00:08:49,645 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 00:09:08,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9100, loss[loss=0.08177, beats_loss=0.01026, ecapa_loss=0.0002642, whisper_loss=0.06887, over 14564.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01141, ecapa_loss=0.0002202, whisper_loss=0.09551, over 3887168.92 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:09:08,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-11 00:09:27,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2024-08-11 00:09:31,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=815750.0, ans=0.125 2024-08-11 00:09:43,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=815850.0, ans=0.2 2024-08-11 00:09:47,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815850.0, ans=0.125 2024-08-11 00:09:58,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.712e+01 2.999e+01 3.385e+01 5.028e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 00:10:20,703 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9150, loss[loss=0.1132, beats_loss=0.01106, ecapa_loss=0.0001708, whisper_loss=0.1004, over 14959.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01153, ecapa_loss=0.0002194, whisper_loss=0.09477, over 3896189.38 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:10:22,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=816150.0, ans=0.2 2024-08-11 00:10:35,356 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 00:10:43,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=816250.0, ans=0.125 2024-08-11 00:10:47,330 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 00:10:52,281 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 00:10:56,163 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 00:11:01,666 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 00:11:28,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=816550.0, ans=0.125 2024-08-11 00:11:29,125 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 00:11:36,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9200, loss[loss=0.08007, beats_loss=0.01553, ecapa_loss=0.0001859, whisper_loss=0.06269, over 22620.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01154, ecapa_loss=0.0002199, whisper_loss=0.09472, over 3929691.05 frames. ], batch size: 94, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:11:54,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2024-08-11 00:11:57,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=816750.0, ans=0.0 2024-08-11 00:12:09,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=816850.0, ans=0.2 2024-08-11 00:12:16,821 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 00:12:27,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.632e+01 3.033e+01 3.497e+01 1.383e+02, threshold=6.066e+01, percent-clipped=1.0 2024-08-11 00:12:38,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=817050.0, ans=10.0 2024-08-11 00:12:42,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2024-08-11 00:12:46,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=817050.0, ans=0.125 2024-08-11 00:12:47,733 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 00:12:48,903 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9250, loss[loss=0.1097, beats_loss=0.01218, ecapa_loss=0.0002027, whisper_loss=0.09549, over 15580.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0116, ecapa_loss=0.0002197, whisper_loss=0.09453, over 3934295.65 frames. ], batch size: 62, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:12:49,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-08-11 00:12:57,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=817150.0, ans=0.125 2024-08-11 00:13:14,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=817250.0, ans=0.0 2024-08-11 00:13:40,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=817450.0, ans=0.0 2024-08-11 00:13:56,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=817550.0, ans=0.95 2024-08-11 00:13:57,758 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 00:13:59,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=817550.0, ans=0.05 2024-08-11 00:14:06,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9300, loss[loss=0.1191, beats_loss=0.009346, ecapa_loss=0.0002544, whisper_loss=0.1072, over 22087.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01167, ecapa_loss=0.0002187, whisper_loss=0.09421, over 3924220.39 frames. ], batch size: 88, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:14:20,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=817750.0, ans=0.0 2024-08-11 00:14:25,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=817750.0, ans=0.05 2024-08-11 00:14:26,333 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 00:14:39,734 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 00:14:40,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817850.0, ans=0.1 2024-08-11 00:14:58,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.667e+01 2.966e+01 3.383e+01 7.144e+01, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 00:15:09,489 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 00:15:19,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9350, loss[loss=0.1104, beats_loss=0.01141, ecapa_loss=0.000184, whisper_loss=0.0972, over 17980.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01172, ecapa_loss=0.0002202, whisper_loss=0.0931, over 3900477.67 frames. ], batch size: 67, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:15:21,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-11 00:15:22,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=818150.0, ans=0.125 2024-08-11 00:15:30,772 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 00:15:42,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=818250.0, ans=0.0 2024-08-11 00:15:47,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2024-08-11 00:15:48,124 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 00:15:48,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=818350.0, ans=0.125 2024-08-11 00:16:03,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-11 00:16:32,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9400, loss[loss=0.09664, beats_loss=0.01142, ecapa_loss=0.0002499, whisper_loss=0.08272, over 16516.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01164, ecapa_loss=0.0002214, whisper_loss=0.09326, over 3859575.18 frames. ], batch size: 67, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:16:32,876 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 00:16:38,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=818650.0, ans=0.0 2024-08-11 00:17:06,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=818850.0, ans=0.2 2024-08-11 00:17:14,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=818950.0, ans=0.0 2024-08-11 00:17:20,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.820e+01 3.162e+01 3.777e+01 5.486e+01, threshold=6.323e+01, percent-clipped=0.0 2024-08-11 00:17:32,950 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 00:17:33,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.19 vs. limit=15.0 2024-08-11 00:17:41,119 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9450, loss[loss=0.1138, beats_loss=0.01066, ecapa_loss=0.0001805, whisper_loss=0.1013, over 18535.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01169, ecapa_loss=0.0002209, whisper_loss=0.0934, over 3850283.25 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:17:54,874 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 00:18:00,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=819250.0, ans=0.125 2024-08-11 00:18:07,016 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 00:18:19,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=819350.0, ans=0.07 2024-08-11 00:18:32,956 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 15 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 00:18:36,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2024-08-11 00:18:48,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9500, loss[loss=0.1156, beats_loss=0.01176, ecapa_loss=0.000188, whisper_loss=0.1019, over 22583.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01169, ecapa_loss=0.0002208, whisper_loss=0.09347, over 3871495.09 frames. ], batch size: 88, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:18:55,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-08-11 00:18:57,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=819650.0, ans=0.125 2024-08-11 00:19:24,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-11 00:19:29,552 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 00:19:37,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.851e+01 3.283e+01 3.927e+01 7.522e+01, threshold=6.566e+01, percent-clipped=2.0 2024-08-11 00:19:41,991 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 00:19:45,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=820050.0, ans=0.125 2024-08-11 00:19:52,808 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 00:19:55,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820050.0, ans=0.1 2024-08-11 00:19:58,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9550, loss[loss=0.09231, beats_loss=0.01326, ecapa_loss=0.0001641, whisper_loss=0.0774, over 20569.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01181, ecapa_loss=0.0002188, whisper_loss=0.09262, over 3879800.72 frames. ], batch size: 82, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:20:01,353 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 00:20:15,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=820250.0, ans=0.125 2024-08-11 00:20:25,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=820350.0, ans=0.0 2024-08-11 00:20:34,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820350.0, ans=0.1 2024-08-11 00:20:58,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=820550.0, ans=0.125 2024-08-11 00:21:04,553 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9600, loss[loss=0.1008, beats_loss=0.01039, ecapa_loss=0.00021, whisper_loss=0.08827, over 19647.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01174, ecapa_loss=0.0002188, whisper_loss=0.09333, over 3889821.18 frames. ], batch size: 77, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:21:19,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2024-08-11 00:21:23,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=820750.0, ans=0.0 2024-08-11 00:21:24,315 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 00:21:29,830 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 00:21:30,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=820850.0, ans=0.125 2024-08-11 00:21:35,527 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.586e-02 2024-08-11 00:21:35,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=820850.0, ans=0.0 2024-08-11 00:21:47,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-11 00:21:48,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=820950.0, ans=0.0 2024-08-11 00:21:50,895 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.676e+01 3.117e+01 3.565e+01 7.658e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 00:21:58,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2024-08-11 00:22:04,208 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 35 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 00:22:08,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=821050.0, ans=0.125 2024-08-11 00:22:08,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=821050.0, ans=0.07 2024-08-11 00:22:10,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9650, loss[loss=0.08845, beats_loss=0.01195, ecapa_loss=0.0002351, whisper_loss=0.07415, over 14658.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01163, ecapa_loss=0.0002212, whisper_loss=0.09409, over 3858546.81 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:22:16,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=821150.0, ans=0.125 2024-08-11 00:22:20,507 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 00:22:27,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=821250.0, ans=0.05 2024-08-11 00:22:31,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=15.0 2024-08-11 00:23:06,176 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 00:23:06,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=821550.0, ans=0.0 2024-08-11 00:23:16,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9700, loss[loss=0.1119, beats_loss=0.0115, ecapa_loss=0.0002703, whisper_loss=0.09767, over 15194.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01165, ecapa_loss=0.0002215, whisper_loss=0.09409, over 3840214.67 frames. ], batch size: 64, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:23:19,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2024-08-11 00:23:26,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821650.0, ans=0.1 2024-08-11 00:23:28,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=821650.0, ans=0.2 2024-08-11 00:23:39,403 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 00:23:51,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.76 vs. limit=15.0 2024-08-11 00:23:59,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2024-08-11 00:24:02,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.803e+01 3.195e+01 3.718e+01 6.974e+01, threshold=6.391e+01, percent-clipped=1.0 2024-08-11 00:24:03,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821950.0, ans=0.1 2024-08-11 00:24:14,817 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 00:24:16,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=822050.0, ans=0.0 2024-08-11 00:24:22,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9750, loss[loss=0.08505, beats_loss=0.0143, ecapa_loss=0.0002171, whisper_loss=0.06858, over 21327.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01164, ecapa_loss=0.0002212, whisper_loss=0.09352, over 3799100.46 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:24:22,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=822150.0, ans=0.125 2024-08-11 00:24:27,620 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 00:24:28,922 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-11 00:24:41,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=822250.0, ans=0.2 2024-08-11 00:24:42,037 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 00:25:09,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=822450.0, ans=0.0 2024-08-11 00:25:12,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822550.0, ans=0.1 2024-08-11 00:25:26,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9800, loss[loss=0.1066, beats_loss=0.01253, ecapa_loss=0.0002159, whisper_loss=0.09195, over 22315.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01168, ecapa_loss=0.0002203, whisper_loss=0.09309, over 3803006.82 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:25:30,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-11 00:25:35,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2024-08-11 00:25:37,222 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 00:25:44,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822750.0, ans=0.1 2024-08-11 00:25:57,212 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:26:03,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=15.0 2024-08-11 00:26:06,969 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 00:26:07,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=822950.0, ans=0.125 2024-08-11 00:26:10,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=822950.0, ans=0.125 2024-08-11 00:26:10,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=822950.0, ans=0.0 2024-08-11 00:26:12,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.728e+01 3.058e+01 3.533e+01 7.097e+01, threshold=6.116e+01, percent-clipped=1.0 2024-08-11 00:26:16,175 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 34 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 00:26:25,512 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 00:26:31,802 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9850, loss[loss=0.1129, beats_loss=0.01281, ecapa_loss=0.0002039, whisper_loss=0.09803, over 22796.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01165, ecapa_loss=0.0002196, whisper_loss=0.09353, over 3817774.33 frames. ], batch size: 88, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:26:40,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-11 00:26:45,079 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 00:26:56,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-08-11 00:27:09,630 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 00:27:19,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=823450.0, ans=0.125 2024-08-11 00:27:28,076 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 00:27:31,992 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 00:27:36,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=823650.0, ans=0.125 2024-08-11 00:27:37,736 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9900, loss[loss=0.08946, beats_loss=0.01414, ecapa_loss=0.0001853, whisper_loss=0.07346, over 20159.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01166, ecapa_loss=0.000219, whisper_loss=0.09403, over 3838692.85 frames. ], batch size: 80, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:27:39,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=823650.0, ans=0.07 2024-08-11 00:27:41,562 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 00:27:49,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=823750.0, ans=0.125 2024-08-11 00:27:50,526 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 00:27:59,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=823750.0, ans=0.0 2024-08-11 00:28:01,290 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 00:28:21,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=823950.0, ans=0.2 2024-08-11 00:28:23,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.723e+01 2.993e+01 3.476e+01 9.466e+01, threshold=5.985e+01, percent-clipped=1.0 2024-08-11 00:28:26,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=823950.0, ans=0.125 2024-08-11 00:28:30,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=824050.0, ans=0.125 2024-08-11 00:28:37,047 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-11 00:28:43,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 9950, loss[loss=0.09527, beats_loss=0.01333, ecapa_loss=0.0002621, whisper_loss=0.07932, over 22480.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01172, ecapa_loss=0.0002197, whisper_loss=0.09375, over 3837839.05 frames. ], batch size: 93, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:28:44,996 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-11 00:28:51,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824150.0, ans=0.1 2024-08-11 00:28:54,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-11 00:28:59,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=824250.0, ans=0.125 2024-08-11 00:29:15,122 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 00:29:16,635 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 00:29:17,818 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-11 00:29:18,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824350.0, ans=0.125 2024-08-11 00:29:23,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=824450.0, ans=0.125 2024-08-11 00:29:24,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824450.0, ans=0.125 2024-08-11 00:29:29,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=824450.0, ans=0.125 2024-08-11 00:29:40,692 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 00:29:48,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10000, loss[loss=0.09454, beats_loss=0.01313, ecapa_loss=0.0001838, whisper_loss=0.07958, over 17700.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01171, ecapa_loss=0.0002204, whisper_loss=0.09333, over 3848331.75 frames. ], batch size: 70, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:29:53,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-08-11 00:30:02,542 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 00:30:08,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824750.0, ans=0.1 2024-08-11 00:30:14,192 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 00:30:14,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=824850.0, ans=0.09899494936611666 2024-08-11 00:30:28,658 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 00:30:28,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824950.0, ans=0.1 2024-08-11 00:30:37,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.691e+01 3.032e+01 3.574e+01 5.004e+01, threshold=6.065e+01, percent-clipped=0.0 2024-08-11 00:30:39,166 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 00:30:56,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10050, loss[loss=0.109, beats_loss=0.00963, ecapa_loss=0.0002131, whisper_loss=0.09728, over 23078.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01171, ecapa_loss=0.0002194, whisper_loss=0.0929, over 3855258.08 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:31:05,056 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 00:31:05,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=825150.0, ans=0.0 2024-08-11 00:31:15,513 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 00:31:19,359 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 00:31:22,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=825250.0, ans=0.95 2024-08-11 00:32:30,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=12.0 2024-08-11 00:32:31,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=825550.0, ans=0.2 2024-08-11 00:32:35,414 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10100, loss[loss=0.1114, beats_loss=0.01147, ecapa_loss=0.0001868, whisper_loss=0.09802, over 20734.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01164, ecapa_loss=0.0002203, whisper_loss=0.0933, over 3837955.85 frames. ], batch size: 79, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:32:39,855 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 00:32:41,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2024-08-11 00:32:47,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=825650.0, ans=0.125 2024-08-11 00:32:52,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=825650.0, ans=0.015 2024-08-11 00:32:54,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=825650.0, ans=0.2 2024-08-11 00:33:01,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-11 00:33:42,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=825950.0, ans=0.125 2024-08-11 00:33:55,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.818e+01 3.128e+01 3.591e+01 5.480e+01, threshold=6.256e+01, percent-clipped=0.0 2024-08-11 00:33:59,988 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 00:34:04,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=825950.0, ans=0.125 2024-08-11 00:34:04,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825950.0, ans=0.1 2024-08-11 00:34:08,122 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 00:34:13,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826050.0, ans=0.1 2024-08-11 00:34:31,282 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 00:34:34,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10150, loss[loss=0.07962, beats_loss=0.01354, ecapa_loss=0.000189, whisper_loss=0.06419, over 17531.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01161, ecapa_loss=0.0002205, whisper_loss=0.09324, over 3870768.24 frames. ], batch size: 71, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:34:56,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=826150.0, ans=0.125 2024-08-11 00:34:56,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=826150.0, ans=0.0 2024-08-11 00:35:17,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=826250.0, ans=0.05 2024-08-11 00:35:25,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-11 00:35:54,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=826450.0, ans=22.5 2024-08-11 00:36:19,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2024-08-11 00:36:37,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10200, loss[loss=0.1072, beats_loss=0.009057, ecapa_loss=0.0002434, whisper_loss=0.09567, over 20673.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01159, ecapa_loss=0.0002202, whisper_loss=0.09341, over 3871060.78 frames. ], batch size: 83, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:36:40,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=12.0 2024-08-11 00:36:44,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-11 00:36:53,124 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 00:37:14,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=826750.0, ans=0.125 2024-08-11 00:37:22,928 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 00:37:40,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.717e+01 3.021e+01 3.434e+01 5.708e+01, threshold=6.043e+01, percent-clipped=0.0 2024-08-11 00:37:41,180 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 00:37:50,534 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 00:38:03,703 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10250, loss[loss=0.1207, beats_loss=0.0106, ecapa_loss=0.0002401, whisper_loss=0.1077, over 22083.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01155, ecapa_loss=0.0002221, whisper_loss=0.09371, over 3866558.53 frames. ], batch size: 88, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:38:10,324 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 00:38:28,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827250.0, ans=0.1 2024-08-11 00:38:31,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827250.0, ans=0.125 2024-08-11 00:38:36,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=827350.0, ans=0.125 2024-08-11 00:38:49,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.52 vs. limit=22.5 2024-08-11 00:38:53,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=827450.0, ans=0.125 2024-08-11 00:39:05,434 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 00:39:08,807 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 00:39:19,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10300, loss[loss=0.1348, beats_loss=0.008942, ecapa_loss=0.0002109, whisper_loss=0.1238, over 22966.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01162, ecapa_loss=0.0002211, whisper_loss=0.0937, over 3906730.05 frames. ], batch size: 88, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:39:21,334 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-11 00:39:34,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=827750.0, ans=0.2 2024-08-11 00:39:57,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=827850.0, ans=0.125 2024-08-11 00:40:04,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=827950.0, ans=0.125 2024-08-11 00:40:12,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=827950.0, ans=0.125 2024-08-11 00:40:13,047 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.682e+01 2.875e+01 3.472e+01 4.715e+01, threshold=5.749e+01, percent-clipped=0.0 2024-08-11 00:40:36,198 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10350, loss[loss=0.09306, beats_loss=0.01425, ecapa_loss=0.0002169, whisper_loss=0.07664, over 20293.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01159, ecapa_loss=0.0002206, whisper_loss=0.0942, over 3894003.32 frames. ], batch size: 85, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:40:50,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-11 00:40:54,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=828250.0, ans=0.125 2024-08-11 00:41:00,129 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 00:41:03,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=828250.0, ans=0.0 2024-08-11 00:41:07,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=828350.0, ans=0.0 2024-08-11 00:41:09,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2024-08-11 00:41:54,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10400, loss[loss=0.09295, beats_loss=0.01424, ecapa_loss=0.0002263, whisper_loss=0.07644, over 13954.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01157, ecapa_loss=0.0002215, whisper_loss=0.09429, over 3861970.12 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:42:09,303 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-11 00:42:09,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=828750.0, ans=0.125 2024-08-11 00:42:16,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=828750.0, ans=0.07 2024-08-11 00:42:23,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=828750.0, ans=0.07 2024-08-11 00:42:23,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=828750.0, ans=0.125 2024-08-11 00:42:49,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.708e+01 2.999e+01 3.498e+01 5.568e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 00:42:50,175 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 00:42:50,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=828950.0, ans=0.0 2024-08-11 00:42:50,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=828950.0, ans=0.09899494936611666 2024-08-11 00:43:11,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=829050.0, ans=0.5 2024-08-11 00:43:14,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10450, loss[loss=0.07338, beats_loss=0.01563, ecapa_loss=0.0001928, whisper_loss=0.05583, over 13843.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01162, ecapa_loss=0.0002189, whisper_loss=0.09388, over 3873526.72 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:43:22,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=829150.0, ans=0.04949747468305833 2024-08-11 00:43:23,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829150.0, ans=0.1 2024-08-11 00:43:25,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829150.0, ans=0.1 2024-08-11 00:43:30,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=829250.0, ans=0.0 2024-08-11 00:43:35,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=829250.0, ans=0.0 2024-08-11 00:43:35,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-11 00:43:42,764 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 00:44:16,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2024-08-11 00:44:26,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=829550.0, ans=0.0 2024-08-11 00:44:35,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10500, loss[loss=0.05506, beats_loss=0.01525, ecapa_loss=0.0002256, whisper_loss=0.03756, over 16187.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01158, ecapa_loss=0.0002192, whisper_loss=0.09387, over 3871409.84 frames. ], batch size: 68, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:44:36,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=829650.0, ans=0.0 2024-08-11 00:44:48,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=829650.0, ans=0.125 2024-08-11 00:44:53,382 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 00:44:53,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=829750.0, ans=0.125 2024-08-11 00:44:59,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=22.5 2024-08-11 00:45:04,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=829850.0, ans=0.5 2024-08-11 00:45:05,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=829850.0, ans=0.2 2024-08-11 00:45:08,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=829850.0, ans=22.5 2024-08-11 00:45:14,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=829850.0, ans=0.125 2024-08-11 00:45:16,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=829850.0, ans=0.125 2024-08-11 00:45:18,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=829850.0, ans=0.125 2024-08-11 00:45:18,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=829850.0, ans=0.125 2024-08-11 00:45:19,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=829950.0, ans=0.1 2024-08-11 00:45:19,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-11 00:45:26,298 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-11 00:45:27,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.730e+01 2.985e+01 3.287e+01 5.938e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 00:45:38,023 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 00:45:41,156 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 00:45:49,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10550, loss[loss=0.1073, beats_loss=0.013, ecapa_loss=0.0001433, whisper_loss=0.09284, over 18538.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01153, ecapa_loss=0.0002202, whisper_loss=0.0941, over 3833230.92 frames. ], batch size: 69, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:45:50,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=830150.0, ans=0.0 2024-08-11 00:46:18,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-11 00:46:33,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=830350.0, ans=0.125 2024-08-11 00:47:06,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=830550.0, ans=0.125 2024-08-11 00:47:08,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10600, loss[loss=0.1098, beats_loss=0.009982, ecapa_loss=0.0002871, whisper_loss=0.09699, over 18286.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01152, ecapa_loss=0.000221, whisper_loss=0.09445, over 3833093.48 frames. ], batch size: 79, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:47:21,482 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 00:47:24,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=830750.0, ans=0.0 2024-08-11 00:47:26,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=830750.0, ans=0.2 2024-08-11 00:47:27,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=830750.0, ans=0.0 2024-08-11 00:47:42,424 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 00:47:51,999 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 00:48:00,773 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.647e+01 3.131e+01 3.600e+01 5.761e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 00:48:18,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=831050.0, ans=0.0 2024-08-11 00:48:23,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10650, loss[loss=0.1078, beats_loss=0.01149, ecapa_loss=0.0001847, whisper_loss=0.09448, over 19744.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01149, ecapa_loss=0.0002206, whisper_loss=0.0942, over 3798186.24 frames. ], batch size: 79, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:49:10,480 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 00:49:28,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.19 vs. limit=10.0 2024-08-11 00:49:40,197 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10700, loss[loss=0.1054, beats_loss=0.01145, ecapa_loss=0.000228, whisper_loss=0.09163, over 16878.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01148, ecapa_loss=0.0002196, whisper_loss=0.09467, over 3809344.86 frames. ], batch size: 69, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:49:52,261 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=6.271e-02 2024-08-11 00:49:55,423 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 00:50:08,138 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 00:50:08,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=831850.0, ans=0.125 2024-08-11 00:50:12,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=12.0 2024-08-11 00:50:17,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=831850.0, ans=0.125 2024-08-11 00:50:31,518 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.817e+01 3.065e+01 3.573e+01 8.621e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-11 00:50:35,304 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 00:50:37,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=831950.0, ans=0.125 2024-08-11 00:50:37,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=831950.0, ans=0.05 2024-08-11 00:50:53,657 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10750, loss[loss=0.1169, beats_loss=0.009803, ecapa_loss=0.0002221, whisper_loss=0.1049, over 20613.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01144, ecapa_loss=0.0002209, whisper_loss=0.09533, over 3828383.59 frames. ], batch size: 81, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:50:56,712 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 00:51:11,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832250.0, ans=0.1 2024-08-11 00:51:13,555 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 00:51:14,883 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 00:51:28,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832350.0, ans=0.1 2024-08-11 00:51:39,282 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 00:51:52,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2024-08-11 00:51:56,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=832550.0, ans=0.0 2024-08-11 00:52:03,787 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 00:52:11,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10800, loss[loss=0.1249, beats_loss=0.01001, ecapa_loss=0.0002694, whisper_loss=0.1121, over 22892.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01145, ecapa_loss=0.0002218, whisper_loss=0.09477, over 3852408.91 frames. ], batch size: 93, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:52:19,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=832650.0, ans=0.125 2024-08-11 00:52:25,657 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 00:52:43,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=832850.0, ans=0.2 2024-08-11 00:52:43,850 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 00:52:46,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=832850.0, ans=0.0 2024-08-11 00:52:49,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-11 00:52:56,443 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 41 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 00:53:04,609 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.723e+01 3.219e+01 3.827e+01 1.923e+02, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 00:53:05,106 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 00:53:15,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=833050.0, ans=0.0 2024-08-11 00:53:19,904 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-11 00:53:24,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=833050.0, ans=0.05 2024-08-11 00:53:26,867 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10850, loss[loss=0.09928, beats_loss=0.01326, ecapa_loss=0.0001603, whisper_loss=0.08442, over 19392.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01149, ecapa_loss=0.0002207, whisper_loss=0.09553, over 3880344.18 frames. ], batch size: 76, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:53:31,814 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 00:53:38,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=833150.0, ans=0.2 2024-08-11 00:53:42,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=833250.0, ans=0.125 2024-08-11 00:54:08,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=833350.0, ans=0.125 2024-08-11 00:54:08,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=833350.0, ans=0.09899494936611666 2024-08-11 00:54:34,766 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 00:54:39,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-11 00:54:39,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=15.0 2024-08-11 00:54:43,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10900, loss[loss=0.1121, beats_loss=0.0103, ecapa_loss=0.0002038, whisper_loss=0.09981, over 16204.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01146, ecapa_loss=0.0002206, whisper_loss=0.09613, over 3879801.67 frames. ], batch size: 62, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:54:44,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.19 vs. limit=22.5 2024-08-11 00:54:57,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=833750.0, ans=0.2 2024-08-11 00:55:07,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=833750.0, ans=0.125 2024-08-11 00:55:35,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.641e+01 2.975e+01 3.587e+01 5.714e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-11 00:55:38,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.84 vs. limit=10.0 2024-08-11 00:55:42,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-11 00:55:58,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 10950, loss[loss=0.1019, beats_loss=0.01088, ecapa_loss=0.0002377, whisper_loss=0.08866, over 16549.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01161, ecapa_loss=0.0002178, whisper_loss=0.0948, over 3868663.33 frames. ], batch size: 65, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:56:03,432 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 00:56:28,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-11 00:56:35,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834350.0, ans=0.1 2024-08-11 00:56:46,153 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 00:56:50,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834450.0, ans=0.1 2024-08-11 00:57:02,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=834550.0, ans=0.125 2024-08-11 00:57:03,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=834550.0, ans=0.125 2024-08-11 00:57:12,442 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.731e-03 2024-08-11 00:57:13,055 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11000, loss[loss=0.09994, beats_loss=0.01095, ecapa_loss=0.0002032, whisper_loss=0.08696, over 16594.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01155, ecapa_loss=0.0002197, whisper_loss=0.09562, over 3895215.11 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:57:16,547 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 00:57:22,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=12.0 2024-08-11 00:57:23,532 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-11 00:57:25,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=834650.0, ans=0.125 2024-08-11 00:57:30,022 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 00:57:36,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=834750.0, ans=0.125 2024-08-11 00:57:38,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=834750.0, ans=0.125 2024-08-11 00:57:41,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=834750.0, ans=0.0 2024-08-11 00:57:46,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=834850.0, ans=0.125 2024-08-11 00:58:06,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.814e+01 3.042e+01 3.466e+01 5.998e+01, threshold=6.084e+01, percent-clipped=1.0 2024-08-11 00:58:19,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835050.0, ans=0.125 2024-08-11 00:58:26,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=835050.0, ans=0.0 2024-08-11 00:58:30,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11050, loss[loss=0.1162, beats_loss=0.00997, ecapa_loss=0.0001911, whisper_loss=0.1044, over 24519.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01147, ecapa_loss=0.0002204, whisper_loss=0.09545, over 3911066.35 frames. ], batch size: 93, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:58:45,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=835150.0, ans=0.2 2024-08-11 00:58:54,963 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 00:59:07,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=835350.0, ans=0.125 2024-08-11 00:59:07,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835350.0, ans=0.1 2024-08-11 00:59:44,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=835550.0, ans=0.0 2024-08-11 00:59:45,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=835550.0, ans=0.0 2024-08-11 00:59:55,732 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 00:59:58,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11100, loss[loss=0.1132, beats_loss=0.009831, ecapa_loss=0.0002243, whisper_loss=0.1011, over 14184.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01146, ecapa_loss=0.0002207, whisper_loss=0.09501, over 3928370.48 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:00:09,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=835650.0, ans=0.025 2024-08-11 01:00:39,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-11 01:00:42,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2024-08-11 01:00:47,185 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 01:00:49,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-08-11 01:00:53,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.772e+01 3.086e+01 3.680e+01 7.620e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 01:01:11,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=836050.0, ans=0.125 2024-08-11 01:01:13,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=836050.0, ans=0.125 2024-08-11 01:01:19,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11150, loss[loss=0.09928, beats_loss=0.01276, ecapa_loss=0.0002142, whisper_loss=0.08438, over 23074.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0115, ecapa_loss=0.0002181, whisper_loss=0.09463, over 3931041.56 frames. ], batch size: 98, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:01:34,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=836250.0, ans=0.0 2024-08-11 01:01:35,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=836250.0, ans=0.125 2024-08-11 01:01:48,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2024-08-11 01:01:52,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=836350.0, ans=0.125 2024-08-11 01:01:54,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=836350.0, ans=0.0 2024-08-11 01:01:55,632 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 01:01:57,187 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 01:02:25,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836550.0, ans=0.0 2024-08-11 01:02:29,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=12.0 2024-08-11 01:02:36,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11200, loss[loss=0.1048, beats_loss=0.01079, ecapa_loss=0.0002079, whisper_loss=0.09192, over 22050.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01155, ecapa_loss=0.0002171, whisper_loss=0.09385, over 3891235.76 frames. ], batch size: 88, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:02:41,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=836650.0, ans=0.0 2024-08-11 01:02:55,057 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 01:03:21,301 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 36 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-11 01:03:25,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=836850.0, ans=0.2 2024-08-11 01:03:35,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.788e+01 3.078e+01 3.604e+01 6.278e+01, threshold=6.156e+01, percent-clipped=2.0 2024-08-11 01:03:52,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=837050.0, ans=0.125 2024-08-11 01:03:52,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=837050.0, ans=0.125 2024-08-11 01:03:54,401 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 01:04:01,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11250, loss[loss=0.1159, beats_loss=0.0121, ecapa_loss=0.0002007, whisper_loss=0.1017, over 18758.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01156, ecapa_loss=0.0002157, whisper_loss=0.0947, over 3905756.48 frames. ], batch size: 73, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:04:03,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=837150.0, ans=0.125 2024-08-11 01:04:47,843 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-11 01:05:24,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-11 01:05:25,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11300, loss[loss=0.1108, beats_loss=0.01064, ecapa_loss=0.0002537, whisper_loss=0.0976, over 15812.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01155, ecapa_loss=0.000218, whisper_loss=0.09406, over 3873859.38 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:05:48,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-11 01:05:49,700 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 01:05:53,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=837750.0, ans=0.125 2024-08-11 01:05:55,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=837750.0, ans=0.125 2024-08-11 01:06:11,766 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 01:06:15,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837950.0, ans=0.1 2024-08-11 01:06:21,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.704e+01 3.204e+01 3.789e+01 1.454e+02, threshold=6.408e+01, percent-clipped=1.0 2024-08-11 01:06:27,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838050.0, ans=0.0 2024-08-11 01:06:28,750 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 01:06:45,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-08-11 01:06:45,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11350, loss[loss=0.09609, beats_loss=0.01325, ecapa_loss=0.0002082, whisper_loss=0.08076, over 21405.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01149, ecapa_loss=0.0002191, whisper_loss=0.09482, over 3874555.90 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:07:06,688 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 01:07:39,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=838450.0, ans=0.0 2024-08-11 01:07:39,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=838450.0, ans=0.2 2024-08-11 01:07:47,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=838550.0, ans=0.0 2024-08-11 01:07:47,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=838550.0, ans=0.0 2024-08-11 01:07:49,948 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 01:08:03,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11400, loss[loss=0.1275, beats_loss=0.009917, ecapa_loss=0.0002336, whisper_loss=0.1152, over 22125.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01157, ecapa_loss=0.0002177, whisper_loss=0.09467, over 3859526.04 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:08:09,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=838650.0, ans=0.5 2024-08-11 01:08:23,420 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 01:08:25,417 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 01:08:34,407 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-11 01:08:36,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=838850.0, ans=0.125 2024-08-11 01:08:43,050 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.434e+02 2024-08-11 01:08:43,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=838850.0, ans=0.0 2024-08-11 01:08:43,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=838850.0, ans=0.2 2024-08-11 01:08:47,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2024-08-11 01:08:51,905 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 01:08:58,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.886e+01 3.314e+01 4.166e+01 1.030e+02, threshold=6.628e+01, percent-clipped=1.0 2024-08-11 01:09:07,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-11 01:09:09,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=12.0 2024-08-11 01:09:20,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11450, loss[loss=0.1067, beats_loss=0.01217, ecapa_loss=0.0002534, whisper_loss=0.09199, over 20298.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0116, ecapa_loss=0.0002185, whisper_loss=0.09404, over 3855699.92 frames. ], batch size: 88, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:09:39,030 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-11 01:10:20,942 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 01:10:32,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2024-08-11 01:10:34,168 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 01:10:44,074 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11500, loss[loss=0.1137, beats_loss=0.01126, ecapa_loss=0.0001893, whisper_loss=0.1006, over 21712.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01151, ecapa_loss=0.0002182, whisper_loss=0.09522, over 3901367.92 frames. ], batch size: 87, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:10:46,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839650.0, ans=0.1 2024-08-11 01:10:55,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839650.0, ans=0.1 2024-08-11 01:11:00,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=839750.0, ans=0.0 2024-08-11 01:11:08,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=839750.0, ans=0.2 2024-08-11 01:11:13,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=839750.0, ans=0.125 2024-08-11 01:11:18,682 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 01:11:35,100 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 01:11:37,200 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 01:11:39,614 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 01:11:43,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.719e+01 3.134e+01 3.590e+01 4.797e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-11 01:12:06,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11550, loss[loss=0.09966, beats_loss=0.01071, ecapa_loss=0.0001765, whisper_loss=0.08719, over 19370.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01142, ecapa_loss=0.0002167, whisper_loss=0.09574, over 3908711.13 frames. ], batch size: 73, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:12:07,232 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-11 01:12:10,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=840150.0, ans=0.125 2024-08-11 01:12:27,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=840250.0, ans=0.5 2024-08-11 01:12:29,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=840250.0, ans=0.0 2024-08-11 01:12:32,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=840250.0, ans=0.1 2024-08-11 01:12:37,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=840250.0, ans=0.0 2024-08-11 01:12:39,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=840350.0, ans=0.125 2024-08-11 01:12:40,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2024-08-11 01:12:58,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=840450.0, ans=0.0 2024-08-11 01:13:00,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=840450.0, ans=0.125 2024-08-11 01:13:21,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=840550.0, ans=0.2 2024-08-11 01:13:27,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11600, loss[loss=0.1002, beats_loss=0.00865, ecapa_loss=0.000267, whisper_loss=0.08887, over 17694.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01144, ecapa_loss=0.0002165, whisper_loss=0.0958, over 3899796.61 frames. ], batch size: 73, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:13:41,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=840650.0, ans=0.125 2024-08-11 01:13:42,503 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 01:13:42,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=840750.0, ans=0.125 2024-08-11 01:13:49,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=840750.0, ans=0.125 2024-08-11 01:13:52,547 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 01:14:03,777 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-11 01:14:04,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=840850.0, ans=0.1 2024-08-11 01:14:12,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840850.0, ans=0.1 2024-08-11 01:14:19,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=840950.0, ans=0.2 2024-08-11 01:14:23,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.786e+01 3.126e+01 3.591e+01 6.008e+01, threshold=6.251e+01, percent-clipped=0.0 2024-08-11 01:14:29,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=840950.0, ans=0.0 2024-08-11 01:14:31,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.31 vs. limit=22.5 2024-08-11 01:14:36,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=841050.0, ans=12.0 2024-08-11 01:14:41,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=841050.0, ans=0.05 2024-08-11 01:14:44,041 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 01:14:47,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11650, loss[loss=0.1284, beats_loss=0.01112, ecapa_loss=0.0002305, whisper_loss=0.115, over 15006.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01155, ecapa_loss=0.0002165, whisper_loss=0.09479, over 3900006.13 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:14:59,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2024-08-11 01:15:00,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2024-08-11 01:15:26,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=841350.0, ans=0.0 2024-08-11 01:15:28,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=841350.0, ans=0.0 2024-08-11 01:15:30,607 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 01:15:36,233 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 01:16:05,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11700, loss[loss=0.0885, beats_loss=0.01291, ecapa_loss=0.0001886, whisper_loss=0.0737, over 17632.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01162, ecapa_loss=0.0002168, whisper_loss=0.09456, over 3893633.82 frames. ], batch size: 69, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:16:13,395 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 01:16:15,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=841650.0, ans=0.125 2024-08-11 01:16:18,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=841650.0, ans=0.125 2024-08-11 01:16:18,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841650.0, ans=0.1 2024-08-11 01:16:36,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=841850.0, ans=0.125 2024-08-11 01:16:37,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-11 01:16:42,255 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.535e+05 2024-08-11 01:16:46,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=841850.0, ans=0.125 2024-08-11 01:16:50,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=841950.0, ans=0.125 2024-08-11 01:16:59,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.883e+01 3.187e+01 3.882e+01 5.856e+01, threshold=6.374e+01, percent-clipped=0.0 2024-08-11 01:17:10,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-11 01:17:11,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=842050.0, ans=0.2 2024-08-11 01:17:22,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.88 vs. limit=10.0 2024-08-11 01:17:23,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11750, loss[loss=0.1138, beats_loss=0.009031, ecapa_loss=0.0002429, whisper_loss=0.1023, over 16533.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01153, ecapa_loss=0.0002163, whisper_loss=0.09538, over 3902265.28 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:17:41,778 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 01:17:54,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=842350.0, ans=0.2 2024-08-11 01:17:56,737 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 01:18:08,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2024-08-11 01:18:22,216 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 01:18:23,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2024-08-11 01:18:40,784 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11800, loss[loss=0.0911, beats_loss=0.01429, ecapa_loss=0.0001676, whisper_loss=0.07513, over 24605.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01165, ecapa_loss=0.0002149, whisper_loss=0.09456, over 3909072.92 frames. ], batch size: 95, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:18:43,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=842650.0, ans=0.2 2024-08-11 01:18:44,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=842650.0, ans=0.125 2024-08-11 01:19:22,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=842850.0, ans=0.125 2024-08-11 01:19:33,245 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 01:19:38,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+01 2.831e+01 3.248e+01 3.772e+01 8.461e+01, threshold=6.495e+01, percent-clipped=3.0 2024-08-11 01:19:43,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=842950.0, ans=0.0 2024-08-11 01:19:45,114 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 01:19:47,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2024-08-11 01:19:52,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2024-08-11 01:19:55,424 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 01:19:58,436 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 01:20:02,315 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 01:20:03,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11850, loss[loss=0.1161, beats_loss=0.01153, ecapa_loss=0.0002234, whisper_loss=0.1024, over 21412.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01163, ecapa_loss=0.0002163, whisper_loss=0.09534, over 3937746.36 frames. ], batch size: 87, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:20:24,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-11 01:20:34,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=843350.0, ans=0.125 2024-08-11 01:20:43,614 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 01:20:56,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=843450.0, ans=0.95 2024-08-11 01:21:01,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=843450.0, ans=0.0 2024-08-11 01:21:02,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=843450.0, ans=0.125 2024-08-11 01:21:03,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=843450.0, ans=0.125 2024-08-11 01:21:07,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=843550.0, ans=0.125 2024-08-11 01:21:12,329 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 01:21:13,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-11 01:21:15,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=843550.0, ans=0.125 2024-08-11 01:21:15,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2024-08-11 01:21:20,983 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11900, loss[loss=0.09616, beats_loss=0.01106, ecapa_loss=0.0002452, whisper_loss=0.08264, over 18805.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0116, ecapa_loss=0.0002186, whisper_loss=0.09549, over 3933880.14 frames. ], batch size: 75, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:21:39,481 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 01:21:43,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=843750.0, ans=0.0 2024-08-11 01:21:43,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=843750.0, ans=0.0 2024-08-11 01:21:44,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=843750.0, ans=0.07 2024-08-11 01:22:09,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-11 01:22:13,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.860e+01 3.257e+01 3.543e+01 6.146e+01, threshold=6.513e+01, percent-clipped=0.0 2024-08-11 01:22:20,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=844050.0, ans=0.125 2024-08-11 01:22:24,379 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 01:22:24,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=844050.0, ans=0.125 2024-08-11 01:22:32,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844050.0, ans=0.1 2024-08-11 01:22:34,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 11950, loss[loss=0.1248, beats_loss=0.008981, ecapa_loss=0.000231, whisper_loss=0.1135, over 22725.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01159, ecapa_loss=0.0002194, whisper_loss=0.09499, over 3882428.65 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:22:42,497 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 01:23:10,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=844350.0, ans=0.125 2024-08-11 01:23:26,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=844450.0, ans=0.2 2024-08-11 01:23:43,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-08-11 01:23:53,712 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12000, loss[loss=0.09907, beats_loss=0.009942, ecapa_loss=0.0002217, whisper_loss=0.08692, over 16564.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01155, ecapa_loss=0.00022, whisper_loss=0.0944, over 3852768.04 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:23:53,713 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 01:24:32,737 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on ASR_libri: loss=0.2603, beats_loss=0, ecapa_loss=0.0006879, whisper_loss=0.2534, over 922467.00 frames. 2024-08-11 01:24:50,827 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 01:26:40,357 INFO [train_multi_KD3.py:1149] (3/4) Epoch 6, validation on AT_audioset: loss=0.02599, beats_loss=0.02599, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 01:26:40,361 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 01:26:48,484 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 01:27:07,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=844750.0, ans=0.125 2024-08-11 01:27:13,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=844850.0, ans=0.125 2024-08-11 01:27:29,079 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 01:27:33,714 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 01:27:35,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-08-11 01:27:35,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.989e+01 3.252e+01 3.842e+01 6.267e+01, threshold=6.505e+01, percent-clipped=0.0 2024-08-11 01:27:36,464 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 01:27:54,390 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 01:27:56,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=845050.0, ans=0.0 2024-08-11 01:28:00,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12050, loss[loss=0.1126, beats_loss=0.009127, ecapa_loss=0.0002423, whisper_loss=0.1011, over 21901.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01152, ecapa_loss=0.0002207, whisper_loss=0.09389, over 3862824.98 frames. ], batch size: 88, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:28:17,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845250.0, ans=0.1 2024-08-11 01:28:50,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=845450.0, ans=0.0 2024-08-11 01:28:54,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-11 01:29:04,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=845550.0, ans=0.2 2024-08-11 01:29:09,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=845550.0, ans=0.125 2024-08-11 01:29:17,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12100, loss[loss=0.1054, beats_loss=0.01422, ecapa_loss=0.0001927, whisper_loss=0.08923, over 19023.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01158, ecapa_loss=0.0002204, whisper_loss=0.09355, over 3876259.64 frames. ], batch size: 75, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:29:19,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=845650.0, ans=0.125 2024-08-11 01:29:24,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=845650.0, ans=0.2 2024-08-11 01:29:33,376 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 01:29:33,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=845750.0, ans=0.0 2024-08-11 01:29:35,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=22.5 2024-08-11 01:30:02,195 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 01:30:10,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.614e+01 2.881e+01 3.224e+01 5.170e+01, threshold=5.763e+01, percent-clipped=0.0 2024-08-11 01:30:14,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=845950.0, ans=0.0 2024-08-11 01:30:32,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2024-08-11 01:30:32,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12150, loss[loss=0.1008, beats_loss=0.01325, ecapa_loss=0.0001752, whisper_loss=0.08581, over 22289.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01168, ecapa_loss=0.00022, whisper_loss=0.09236, over 3862219.90 frames. ], batch size: 88, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:30:33,584 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:30:59,397 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 01:31:13,780 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 32 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-11 01:31:15,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-08-11 01:31:18,215 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 01:31:25,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=846450.0, ans=0.125 2024-08-11 01:31:42,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-11 01:31:49,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12200, loss[loss=0.1092, beats_loss=0.01255, ecapa_loss=0.0002167, whisper_loss=0.09449, over 21413.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01157, ecapa_loss=0.0002204, whisper_loss=0.0937, over 3874712.66 frames. ], batch size: 88, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:31:50,295 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 01:32:14,842 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 01:32:17,614 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 01:32:30,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=846850.0, ans=0.125 2024-08-11 01:32:43,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.785e+01 3.124e+01 3.706e+01 5.181e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-11 01:33:08,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12250, loss[loss=0.1172, beats_loss=0.01095, ecapa_loss=0.000182, whisper_loss=0.1044, over 18397.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01158, ecapa_loss=0.0002191, whisper_loss=0.09381, over 3870485.72 frames. ], batch size: 68, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:33:26,492 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 18 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 01:33:32,904 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 01:33:34,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.15 vs. limit=22.5 2024-08-11 01:33:43,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=847350.0, ans=0.125 2024-08-11 01:33:44,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=847350.0, ans=10.0 2024-08-11 01:33:56,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=847450.0, ans=0.09899494936611666 2024-08-11 01:34:01,103 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 01:34:13,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=847550.0, ans=0.125 2024-08-11 01:34:21,680 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 01:34:28,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12300, loss[loss=0.1164, beats_loss=0.009716, ecapa_loss=0.0001941, whisper_loss=0.1047, over 17245.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01162, ecapa_loss=0.0002189, whisper_loss=0.09388, over 3887784.60 frames. ], batch size: 64, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:34:37,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=8.0 2024-08-11 01:34:46,899 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 01:34:47,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847750.0, ans=0.1 2024-08-11 01:34:52,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2024-08-11 01:35:03,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=847850.0, ans=0.2 2024-08-11 01:35:05,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=847850.0, ans=0.125 2024-08-11 01:35:08,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-11 01:35:24,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.835e+01 3.125e+01 3.646e+01 6.261e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 01:35:35,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=848050.0, ans=0.0 2024-08-11 01:35:47,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12350, loss[loss=0.1005, beats_loss=0.01437, ecapa_loss=0.0002169, whisper_loss=0.08395, over 14449.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01157, ecapa_loss=0.0002194, whisper_loss=0.09481, over 3879452.75 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:35:48,726 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 01:35:50,209 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 01:35:53,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-11 01:35:56,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=848150.0, ans=0.5 2024-08-11 01:36:06,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.67 vs. limit=10.0 2024-08-11 01:36:28,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-08-11 01:36:36,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-08-11 01:36:39,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-11 01:36:40,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=848450.0, ans=0.125 2024-08-11 01:36:45,440 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-11 01:36:49,478 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 01:36:59,920 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 01:37:00,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=848550.0, ans=0.035 2024-08-11 01:37:02,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12400, loss[loss=0.1066, beats_loss=0.01176, ecapa_loss=0.0002087, whisper_loss=0.09276, over 20042.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01154, ecapa_loss=0.0002195, whisper_loss=0.09446, over 3869406.97 frames. ], batch size: 78, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:37:24,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=848750.0, ans=0.125 2024-08-11 01:37:47,657 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 01:37:54,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.646e+01 2.993e+01 3.533e+01 4.877e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 01:38:17,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12450, loss[loss=0.1026, beats_loss=0.01233, ecapa_loss=0.0001614, whisper_loss=0.08866, over 19035.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01155, ecapa_loss=0.0002206, whisper_loss=0.09424, over 3843480.13 frames. ], batch size: 73, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:38:31,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=849250.0, ans=0.125 2024-08-11 01:38:43,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-11 01:38:52,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=849350.0, ans=0.125 2024-08-11 01:38:53,040 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 01:39:01,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-08-11 01:39:02,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=849450.0, ans=0.2 2024-08-11 01:39:13,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=849450.0, ans=0.125 2024-08-11 01:39:19,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=849550.0, ans=0.2 2024-08-11 01:39:19,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=849550.0, ans=0.125 2024-08-11 01:39:31,552 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 01:39:32,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12500, loss[loss=0.1008, beats_loss=0.01167, ecapa_loss=0.0001945, whisper_loss=0.08722, over 18465.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0115, ecapa_loss=0.0002187, whisper_loss=0.094, over 3830235.65 frames. ], batch size: 74, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:39:46,668 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 01:39:53,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=849750.0, ans=0.0 2024-08-11 01:39:57,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=849750.0, ans=0.0 2024-08-11 01:39:59,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=849750.0, ans=0.0 2024-08-11 01:40:25,217 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 01:40:28,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=849950.0, ans=0.2 2024-08-11 01:40:28,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.816e+01 3.133e+01 3.791e+01 6.148e+01, threshold=6.266e+01, percent-clipped=1.0 2024-08-11 01:40:41,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=850050.0, ans=0.0 2024-08-11 01:40:50,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=850150.0, ans=0.2 2024-08-11 01:40:51,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12550, loss[loss=0.1266, beats_loss=0.01076, ecapa_loss=0.0002139, whisper_loss=0.1137, over 22019.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01146, ecapa_loss=0.0002188, whisper_loss=0.09433, over 3835404.67 frames. ], batch size: 88, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:41:08,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=12.0 2024-08-11 01:41:12,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=850250.0, ans=0.125 2024-08-11 01:41:12,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=850250.0, ans=0.09899494936611666 2024-08-11 01:41:20,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=850250.0, ans=0.0 2024-08-11 01:41:34,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=850350.0, ans=0.125 2024-08-11 01:41:58,514 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 01:42:09,867 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 01:42:10,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12600, loss[loss=0.1079, beats_loss=0.0108, ecapa_loss=0.0001783, whisper_loss=0.09527, over 14458.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01148, ecapa_loss=0.0002185, whisper_loss=0.09443, over 3846601.72 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:42:11,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850650.0, ans=0.1 2024-08-11 01:42:19,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=850650.0, ans=0.125 2024-08-11 01:42:28,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=850750.0, ans=0.0 2024-08-11 01:42:30,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=850750.0, ans=0.0 2024-08-11 01:42:34,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=850750.0, ans=0.125 2024-08-11 01:42:38,346 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 01:43:05,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=850950.0, ans=0.2 2024-08-11 01:43:06,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.981e+01 3.398e+01 4.026e+01 7.168e+01, threshold=6.796e+01, percent-clipped=1.0 2024-08-11 01:43:09,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=850950.0, ans=0.125 2024-08-11 01:43:18,122 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-11 01:43:19,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851050.0, ans=0.1 2024-08-11 01:43:26,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-08-11 01:43:29,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12650, loss[loss=0.09799, beats_loss=0.01057, ecapa_loss=0.0001706, whisper_loss=0.08571, over 16161.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01155, ecapa_loss=0.0002181, whisper_loss=0.09467, over 3867738.80 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:43:35,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=851150.0, ans=15.0 2024-08-11 01:43:36,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=851150.0, ans=0.0 2024-08-11 01:43:38,625 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 01:43:42,346 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 27 from Vox, 13 fro AS 2024-08-11 01:43:50,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=851250.0, ans=0.0 2024-08-11 01:43:58,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.65 vs. limit=22.5 2024-08-11 01:44:17,099 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 01:44:44,424 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 01:44:45,991 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 01:44:47,314 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 01:44:48,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12700, loss[loss=0.1234, beats_loss=0.01213, ecapa_loss=0.0002571, whisper_loss=0.1087, over 16107.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01152, ecapa_loss=0.0002181, whisper_loss=0.09519, over 3850768.93 frames. ], batch size: 68, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:45:09,863 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 01:45:25,517 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 01:45:27,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=851850.0, ans=0.125 2024-08-11 01:45:34,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=851950.0, ans=0.0 2024-08-11 01:45:40,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.734e+01 2.989e+01 3.425e+01 5.621e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-11 01:45:41,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=851950.0, ans=0.0 2024-08-11 01:45:49,093 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.227e+05 2024-08-11 01:46:03,308 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 01:46:03,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=852150.0, ans=0.0 2024-08-11 01:46:04,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12750, loss[loss=0.1076, beats_loss=0.01138, ecapa_loss=0.0001966, whisper_loss=0.09428, over 23385.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01151, ecapa_loss=0.0002191, whisper_loss=0.09542, over 3878484.48 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:46:32,992 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.360e-01 2024-08-11 01:46:35,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=852350.0, ans=0.0 2024-08-11 01:46:59,501 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-11 01:47:06,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=852550.0, ans=10.0 2024-08-11 01:47:06,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=852550.0, ans=0.125 2024-08-11 01:47:09,600 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 01:47:19,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12800, loss[loss=0.1094, beats_loss=0.01102, ecapa_loss=0.0002059, whisper_loss=0.09636, over 16348.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01165, ecapa_loss=0.0002177, whisper_loss=0.09482, over 3892381.95 frames. ], batch size: 63, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:47:25,443 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 01:47:29,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=852650.0, ans=0.125 2024-08-11 01:47:30,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=852650.0, ans=0.0 2024-08-11 01:47:34,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-11 01:47:39,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-11 01:47:54,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=852850.0, ans=0.025 2024-08-11 01:47:57,513 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 01:48:01,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=852950.0, ans=0.0 2024-08-11 01:48:07,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=852950.0, ans=0.125 2024-08-11 01:48:09,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.804e+01 3.213e+01 3.707e+01 6.106e+01, threshold=6.425e+01, percent-clipped=1.0 2024-08-11 01:48:10,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.85 vs. limit=22.5 2024-08-11 01:48:18,185 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 01:48:18,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=853050.0, ans=0.125 2024-08-11 01:48:30,681 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12850, loss[loss=0.09554, beats_loss=0.01334, ecapa_loss=0.0002122, whisper_loss=0.08008, over 17213.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01169, ecapa_loss=0.0002178, whisper_loss=0.09446, over 3869964.33 frames. ], batch size: 71, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:48:40,691 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=5.402e-03 2024-08-11 01:48:41,667 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 01:48:46,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853250.0, ans=0.125 2024-08-11 01:48:47,943 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 01:49:05,045 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 01:49:09,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=853350.0, ans=0.125 2024-08-11 01:49:12,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.14 vs. limit=22.5 2024-08-11 01:49:28,924 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 01:49:34,925 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-11 01:49:41,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12900, loss[loss=0.1075, beats_loss=0.009125, ecapa_loss=0.0002302, whisper_loss=0.09606, over 15724.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01168, ecapa_loss=0.0002173, whisper_loss=0.09378, over 3841433.13 frames. ], batch size: 62, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:49:42,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=853650.0, ans=0.125 2024-08-11 01:49:49,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=853650.0, ans=0.2 2024-08-11 01:49:49,994 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 01:49:51,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-11 01:49:58,793 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 01:50:13,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=853850.0, ans=0.125 2024-08-11 01:50:13,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=853850.0, ans=0.125 2024-08-11 01:50:17,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=853850.0, ans=0.125 2024-08-11 01:50:26,426 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 19 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-11 01:50:28,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-11 01:50:32,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.643e+01 2.887e+01 3.353e+01 5.409e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 01:50:41,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=854050.0, ans=0.125 2024-08-11 01:50:53,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-11 01:50:55,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 12950, loss[loss=0.1012, beats_loss=0.01028, ecapa_loss=0.0002706, whisper_loss=0.0882, over 18957.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01159, ecapa_loss=0.0002185, whisper_loss=0.09398, over 3857054.90 frames. ], batch size: 81, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:51:17,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=12.0 2024-08-11 01:51:22,106 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 01:51:48,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=854450.0, ans=0.125 2024-08-11 01:51:51,527 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 01:51:53,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=854450.0, ans=0.125 2024-08-11 01:52:05,395 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.885e+05 2024-08-11 01:52:09,356 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 01:52:10,913 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 01:52:11,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13000, loss[loss=0.08945, beats_loss=0.01182, ecapa_loss=0.0002177, whisper_loss=0.07545, over 17878.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01155, ecapa_loss=0.0002185, whisper_loss=0.09365, over 3874232.25 frames. ], batch size: 73, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:52:29,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-08-11 01:52:32,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2024-08-11 01:52:36,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=12.0 2024-08-11 01:52:39,172 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 01:52:54,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=12.0 2024-08-11 01:53:05,420 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:53:06,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.697e+01 3.039e+01 3.659e+01 7.134e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-11 01:53:18,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=855050.0, ans=0.125 2024-08-11 01:53:29,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13050, loss[loss=0.1109, beats_loss=0.01037, ecapa_loss=0.0002448, whisper_loss=0.09812, over 16821.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01157, ecapa_loss=0.000217, whisper_loss=0.0939, over 3876410.57 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:53:37,876 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 01:53:42,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=855150.0, ans=0.2 2024-08-11 01:54:00,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=855350.0, ans=0.05 2024-08-11 01:54:04,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=855350.0, ans=0.2 2024-08-11 01:54:14,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=855350.0, ans=0.125 2024-08-11 01:54:16,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-11 01:54:30,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=855450.0, ans=0.125 2024-08-11 01:54:39,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=855550.0, ans=0.2 2024-08-11 01:54:39,289 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.597e-01 2024-08-11 01:54:47,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13100, loss[loss=0.09284, beats_loss=0.01123, ecapa_loss=0.0002171, whisper_loss=0.07944, over 18441.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01169, ecapa_loss=0.0002149, whisper_loss=0.09304, over 3875976.37 frames. ], batch size: 73, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:54:49,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=12.0 2024-08-11 01:54:50,399 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 01:54:53,627 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 01:54:54,893 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 01:55:02,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-11 01:55:16,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=855750.0, ans=0.125 2024-08-11 01:55:17,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-11 01:55:40,051 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 01:55:40,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=855950.0, ans=0.035 2024-08-11 01:55:44,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.870e+01 3.154e+01 3.850e+01 5.715e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 01:55:59,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=856050.0, ans=0.2 2024-08-11 01:56:08,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13150, loss[loss=0.1067, beats_loss=0.01157, ecapa_loss=0.0001962, whisper_loss=0.09316, over 16426.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01157, ecapa_loss=0.0002149, whisper_loss=0.09384, over 3865535.82 frames. ], batch size: 64, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:56:16,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2024-08-11 01:56:18,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856150.0, ans=0.125 2024-08-11 01:56:41,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.87 vs. limit=22.5 2024-08-11 01:56:42,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=856350.0, ans=0.125 2024-08-11 01:56:44,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=856350.0, ans=0.5 2024-08-11 01:56:45,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=856350.0, ans=0.1 2024-08-11 01:56:53,749 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 01:57:03,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=856450.0, ans=0.025 2024-08-11 01:57:10,495 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 01:57:12,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=856550.0, ans=0.125 2024-08-11 01:57:13,731 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 01:57:14,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-08-11 01:57:22,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=856550.0, ans=0.05 2024-08-11 01:57:25,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13200, loss[loss=0.1033, beats_loss=0.009044, ecapa_loss=0.0002426, whisper_loss=0.09182, over 22348.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01154, ecapa_loss=0.0002161, whisper_loss=0.09389, over 3854927.00 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:57:34,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856650.0, ans=0.1 2024-08-11 01:57:37,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=856650.0, ans=0.125 2024-08-11 01:57:44,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=856750.0, ans=0.1 2024-08-11 01:57:50,784 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 01:57:51,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-11 01:58:17,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.805e+01 3.191e+01 3.827e+01 5.209e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 01:58:23,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=857050.0, ans=0.2 2024-08-11 01:58:32,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-11 01:58:33,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=857050.0, ans=0.125 2024-08-11 01:58:35,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.09 vs. limit=22.5 2024-08-11 01:58:36,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857050.0, ans=0.1 2024-08-11 01:58:38,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13250, loss[loss=0.1023, beats_loss=0.01155, ecapa_loss=0.0001976, whisper_loss=0.08877, over 14931.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01152, ecapa_loss=0.0002162, whisper_loss=0.09337, over 3833927.28 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:59:02,117 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 01:59:04,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=857250.0, ans=0.0 2024-08-11 01:59:10,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=857350.0, ans=0.0 2024-08-11 01:59:12,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=857350.0, ans=0.125 2024-08-11 01:59:32,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=857450.0, ans=0.0 2024-08-11 01:59:44,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=857550.0, ans=0.95 2024-08-11 01:59:46,414 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 01:59:49,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13300, loss[loss=0.09539, beats_loss=0.01335, ecapa_loss=0.0001913, whisper_loss=0.08013, over 22623.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01146, ecapa_loss=0.0002165, whisper_loss=0.09405, over 3834938.63 frames. ], batch size: 93, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:00:02,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=857750.0, ans=0.125 2024-08-11 02:00:10,826 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 02:00:16,927 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 02:00:23,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=857850.0, ans=0.0 2024-08-11 02:00:26,092 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 02:00:37,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.722e+01 2.995e+01 3.352e+01 6.535e+01, threshold=5.989e+01, percent-clipped=1.0 2024-08-11 02:00:40,532 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 02:00:46,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=858050.0, ans=0.125 2024-08-11 02:00:57,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13350, loss[loss=0.1095, beats_loss=0.01258, ecapa_loss=0.0001997, whisper_loss=0.0949, over 22772.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01156, ecapa_loss=0.0002176, whisper_loss=0.09351, over 3846500.82 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:01:01,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=858150.0, ans=0.125 2024-08-11 02:01:11,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=858250.0, ans=0.0 2024-08-11 02:01:13,178 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 02:01:31,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=858350.0, ans=0.125 2024-08-11 02:01:32,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=858350.0, ans=0.0 2024-08-11 02:01:36,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=858350.0, ans=0.125 2024-08-11 02:01:39,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=858450.0, ans=0.2 2024-08-11 02:01:48,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=858450.0, ans=0.125 2024-08-11 02:02:00,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=858550.0, ans=0.125 2024-08-11 02:02:04,940 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13400, loss[loss=0.1282, beats_loss=0.01078, ecapa_loss=0.0002468, whisper_loss=0.1149, over 22433.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01155, ecapa_loss=0.000219, whisper_loss=0.09333, over 3857040.85 frames. ], batch size: 90, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:02:05,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=12.0 2024-08-11 02:02:23,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-11 02:02:27,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=858750.0, ans=0.0 2024-08-11 02:02:33,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=858850.0, ans=0.125 2024-08-11 02:02:36,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-11 02:02:40,939 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 02:02:51,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.837e+01 3.208e+01 3.826e+01 8.458e+01, threshold=6.417e+01, percent-clipped=4.0 2024-08-11 02:02:53,016 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 02:02:59,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=859050.0, ans=0.0 2024-08-11 02:02:59,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=859050.0, ans=0.0 2024-08-11 02:03:00,389 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 31 from Vox, 20 fro AS 2024-08-11 02:03:02,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=859050.0, ans=0.125 2024-08-11 02:03:03,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=12.0 2024-08-11 02:03:08,525 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 02:03:11,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13450, loss[loss=0.1226, beats_loss=0.008527, ecapa_loss=0.0002519, whisper_loss=0.1116, over 13706.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01161, ecapa_loss=0.0002179, whisper_loss=0.09322, over 3867745.31 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:03:17,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-11 02:03:17,552 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 02:03:30,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-11 02:03:49,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=859450.0, ans=0.125 2024-08-11 02:03:52,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=859450.0, ans=0.2 2024-08-11 02:03:53,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859450.0, ans=0.1 2024-08-11 02:03:56,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-11 02:04:13,868 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 02:04:17,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=859650.0, ans=0.125 2024-08-11 02:04:18,176 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13500, loss[loss=0.1128, beats_loss=0.01027, ecapa_loss=0.0001969, whisper_loss=0.1005, over 22023.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01161, ecapa_loss=0.0002156, whisper_loss=0.09391, over 3897552.53 frames. ], batch size: 84, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:04:21,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=859650.0, ans=0.125 2024-08-11 02:04:45,500 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 02:04:52,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-08-11 02:04:53,237 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 02:05:04,815 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.793e+01 3.249e+01 3.860e+01 6.225e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-11 02:05:21,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=860050.0, ans=0.125 2024-08-11 02:05:24,736 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13550, loss[loss=0.08844, beats_loss=0.01418, ecapa_loss=0.0002215, whisper_loss=0.07205, over 15784.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01162, ecapa_loss=0.0002161, whisper_loss=0.09389, over 3903674.32 frames. ], batch size: 65, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:05:33,801 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 02:05:34,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=860150.0, ans=0.2 2024-08-11 02:05:46,233 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 02:05:49,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2024-08-11 02:05:51,516 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 02:06:03,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=860350.0, ans=0.1 2024-08-11 02:06:06,656 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 02:06:15,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=860450.0, ans=0.125 2024-08-11 02:06:15,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=860450.0, ans=0.09899494936611666 2024-08-11 02:06:17,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=860450.0, ans=0.125 2024-08-11 02:06:18,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=860450.0, ans=0.2 2024-08-11 02:06:23,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-08-11 02:06:29,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=860550.0, ans=0.125 2024-08-11 02:06:30,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.19 vs. limit=22.5 2024-08-11 02:06:34,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13600, loss[loss=0.09585, beats_loss=0.01152, ecapa_loss=0.0002945, whisper_loss=0.08138, over 15789.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0117, ecapa_loss=0.0002149, whisper_loss=0.09347, over 3900627.79 frames. ], batch size: 67, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:06:37,824 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 02:06:41,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=860650.0, ans=0.125 2024-08-11 02:07:13,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=860850.0, ans=0.0 2024-08-11 02:07:22,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2024-08-11 02:07:23,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.998e+01 3.369e+01 4.005e+01 6.707e+01, threshold=6.738e+01, percent-clipped=1.0 2024-08-11 02:07:44,131 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13650, loss[loss=0.09003, beats_loss=0.0145, ecapa_loss=0.0001711, whisper_loss=0.07381, over 22836.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01175, ecapa_loss=0.0002139, whisper_loss=0.09376, over 3911240.25 frames. ], batch size: 91, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:08:13,289 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 02:08:24,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861350.0, ans=0.1 2024-08-11 02:08:33,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=861450.0, ans=0.0 2024-08-11 02:08:33,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2024-08-11 02:08:36,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=861450.0, ans=0.2 2024-08-11 02:08:41,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=861550.0, ans=0.0 2024-08-11 02:08:44,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=861550.0, ans=0.125 2024-08-11 02:08:47,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=861550.0, ans=0.125 2024-08-11 02:08:47,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2024-08-11 02:08:53,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=861650.0, ans=0.1 2024-08-11 02:08:54,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13700, loss[loss=0.09836, beats_loss=0.01319, ecapa_loss=0.0001737, whisper_loss=0.08343, over 19153.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01162, ecapa_loss=0.0002142, whisper_loss=0.09477, over 3920485.40 frames. ], batch size: 74, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:09:00,130 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 38 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 02:09:00,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=861650.0, ans=0.0 2024-08-11 02:09:04,840 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.135e-01 2024-08-11 02:09:06,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=12.0 2024-08-11 02:09:12,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=861750.0, ans=0.0 2024-08-11 02:09:23,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861850.0, ans=0.1 2024-08-11 02:09:28,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=861850.0, ans=0.0 2024-08-11 02:09:43,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2024-08-11 02:09:44,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.742e+01 3.072e+01 3.573e+01 1.415e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 02:09:56,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862050.0, ans=0.1 2024-08-11 02:09:58,304 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 02:09:58,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=862050.0, ans=0.125 2024-08-11 02:10:02,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=862050.0, ans=0.125 2024-08-11 02:10:05,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13750, loss[loss=0.1048, beats_loss=0.01134, ecapa_loss=0.0002058, whisper_loss=0.09143, over 16104.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01155, ecapa_loss=0.0002148, whisper_loss=0.09495, over 3919191.87 frames. ], batch size: 65, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:10:28,007 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-11 02:10:31,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=12.0 2024-08-11 02:10:39,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-08-11 02:10:53,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2024-08-11 02:10:57,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=862450.0, ans=0.0 2024-08-11 02:10:58,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862450.0, ans=0.1 2024-08-11 02:11:11,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=862550.0, ans=0.0 2024-08-11 02:11:14,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13800, loss[loss=0.118, beats_loss=0.008964, ecapa_loss=0.0002968, whisper_loss=0.1061, over 21249.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01156, ecapa_loss=0.000216, whisper_loss=0.09431, over 3898992.19 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:11:18,524 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 02:11:26,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=862650.0, ans=0.125 2024-08-11 02:11:31,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=862750.0, ans=0.1 2024-08-11 02:11:33,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=862750.0, ans=0.0 2024-08-11 02:11:35,320 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 14 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 02:11:38,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=862750.0, ans=0.125 2024-08-11 02:11:59,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2024-08-11 02:12:04,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.614e+01 2.961e+01 3.435e+01 1.383e+02, threshold=5.922e+01, percent-clipped=1.0 2024-08-11 02:12:14,367 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 02:12:15,787 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 02:12:16,951 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 02:12:26,442 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13850, loss[loss=0.102, beats_loss=0.01025, ecapa_loss=0.0001969, whisper_loss=0.08978, over 20253.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0116, ecapa_loss=0.0002136, whisper_loss=0.09367, over 3897651.81 frames. ], batch size: 78, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:12:26,608 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 02:12:37,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-11 02:12:38,324 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:12:42,286 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 02:13:11,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=863450.0, ans=0.0 2024-08-11 02:13:20,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=863450.0, ans=0.125 2024-08-11 02:13:28,925 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 02:13:33,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=863550.0, ans=0.025 2024-08-11 02:13:36,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13900, loss[loss=0.09494, beats_loss=0.01242, ecapa_loss=0.0002167, whisper_loss=0.08035, over 15978.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01151, ecapa_loss=0.000214, whisper_loss=0.09437, over 3900469.33 frames. ], batch size: 66, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:13:47,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=15.0 2024-08-11 02:13:47,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=863650.0, ans=0.5 2024-08-11 02:14:23,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.760e+01 3.035e+01 3.739e+01 6.215e+01, threshold=6.069e+01, percent-clipped=1.0 2024-08-11 02:14:28,837 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.035e-02 2024-08-11 02:14:42,274 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 13950, loss[loss=0.09147, beats_loss=0.01162, ecapa_loss=0.0002615, whisper_loss=0.07723, over 13073.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01146, ecapa_loss=0.0002151, whisper_loss=0.0947, over 3876158.23 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:14:48,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=864150.0, ans=22.5 2024-08-11 02:15:11,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=864350.0, ans=0.2 2024-08-11 02:15:14,808 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 02:15:15,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=864350.0, ans=10.0 2024-08-11 02:15:18,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-11 02:15:21,481 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 02:15:24,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=864450.0, ans=0.1 2024-08-11 02:15:36,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=864550.0, ans=10.0 2024-08-11 02:15:47,376 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14000, loss[loss=0.1215, beats_loss=0.01093, ecapa_loss=0.0002311, whisper_loss=0.1082, over 20971.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01155, ecapa_loss=0.0002149, whisper_loss=0.09473, over 3888573.03 frames. ], batch size: 86, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:16:08,762 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 02:16:19,302 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 02:16:33,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.879e+01 3.227e+01 3.709e+01 6.302e+01, threshold=6.454e+01, percent-clipped=1.0 2024-08-11 02:16:34,665 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 02:16:38,852 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 02:16:41,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=865050.0, ans=0.125 2024-08-11 02:16:52,034 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:16:52,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14050, loss[loss=0.1143, beats_loss=0.01159, ecapa_loss=0.0002099, whisper_loss=0.1006, over 22552.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01157, ecapa_loss=0.0002138, whisper_loss=0.09524, over 3920695.67 frames. ], batch size: 89, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:16:52,956 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 02:17:08,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=865250.0, ans=0.0 2024-08-11 02:17:11,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=865250.0, ans=0.0 2024-08-11 02:17:12,171 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 02:17:19,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=865350.0, ans=0.0 2024-08-11 02:17:57,966 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14100, loss[loss=0.09904, beats_loss=0.01071, ecapa_loss=0.0002093, whisper_loss=0.08624, over 20816.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01156, ecapa_loss=0.0002149, whisper_loss=0.0942, over 3887008.38 frames. ], batch size: 85, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:18:10,343 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 02:18:16,859 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 02:18:26,473 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 02:18:26,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=865850.0, ans=0.125 2024-08-11 02:18:31,920 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 02:18:34,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865850.0, ans=0.1 2024-08-11 02:18:37,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=865950.0, ans=0.05 2024-08-11 02:18:37,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865950.0, ans=0.125 2024-08-11 02:18:39,832 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 02:18:44,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.713e+01 2.992e+01 3.543e+01 5.369e+01, threshold=5.983e+01, percent-clipped=0.0 2024-08-11 02:18:45,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865950.0, ans=0.125 2024-08-11 02:18:46,663 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 02:18:46,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=865950.0, ans=0.0 2024-08-11 02:18:54,556 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 02:19:03,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=866050.0, ans=0.0 2024-08-11 02:19:04,984 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14150, loss[loss=0.09269, beats_loss=0.01156, ecapa_loss=0.000204, whisper_loss=0.07909, over 17885.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01161, ecapa_loss=0.0002146, whisper_loss=0.09401, over 3902032.96 frames. ], batch size: 70, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:19:13,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=866150.0, ans=0.125 2024-08-11 02:19:15,344 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 02:19:50,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=866450.0, ans=0.125 2024-08-11 02:19:56,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=866550.0, ans=0.2 2024-08-11 02:19:56,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=866550.0, ans=0.09899494936611666 2024-08-11 02:19:57,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=866550.0, ans=0.125 2024-08-11 02:20:07,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=866550.0, ans=0.125 2024-08-11 02:20:10,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14200, loss[loss=0.1083, beats_loss=0.009934, ecapa_loss=0.0001905, whisper_loss=0.09643, over 15645.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01153, ecapa_loss=0.0002142, whisper_loss=0.09434, over 3892362.02 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:20:12,093 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 02:20:24,042 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-11 02:20:38,701 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 02:20:40,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=866850.0, ans=0.125 2024-08-11 02:20:57,904 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.819e+01 3.173e+01 3.823e+01 7.553e+01, threshold=6.347e+01, percent-clipped=1.0 2024-08-11 02:20:59,324 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 02:21:11,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=867050.0, ans=0.0 2024-08-11 02:21:19,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14250, loss[loss=0.1268, beats_loss=0.01089, ecapa_loss=0.0001933, whisper_loss=0.114, over 16563.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01154, ecapa_loss=0.0002127, whisper_loss=0.09462, over 3906276.99 frames. ], batch size: 66, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:21:26,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=867150.0, ans=0.125 2024-08-11 02:21:38,658 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 02:21:44,179 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 02:21:52,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=867350.0, ans=0.125 2024-08-11 02:21:55,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.38 vs. limit=22.5 2024-08-11 02:21:55,902 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 12 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 02:22:05,088 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 02:22:21,575 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 02:22:26,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14300, loss[loss=0.1121, beats_loss=0.009071, ecapa_loss=0.0002085, whisper_loss=0.101, over 17634.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01157, ecapa_loss=0.0002114, whisper_loss=0.09433, over 3909167.14 frames. ], batch size: 68, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:22:31,363 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:22:32,357 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-11 02:22:41,751 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 39 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 02:22:41,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=867750.0, ans=0.2 2024-08-11 02:23:08,581 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 6 from Vox, 35 fro AS 2024-08-11 02:23:09,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.53 vs. limit=22.5 2024-08-11 02:23:11,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.633e+01 2.947e+01 3.319e+01 6.322e+01, threshold=5.893e+01, percent-clipped=0.0 2024-08-11 02:23:21,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=868050.0, ans=0.0 2024-08-11 02:23:24,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=868050.0, ans=0.0 2024-08-11 02:23:25,236 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 02:23:30,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=868150.0, ans=0.0 2024-08-11 02:23:31,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14350, loss[loss=0.1039, beats_loss=0.01029, ecapa_loss=0.0002103, whisper_loss=0.0915, over 14365.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01157, ecapa_loss=0.0002115, whisper_loss=0.09401, over 3909579.52 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:23:36,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2024-08-11 02:23:39,538 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 02:23:43,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=868250.0, ans=0.2 2024-08-11 02:23:43,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-11 02:23:52,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=868250.0, ans=0.125 2024-08-11 02:24:05,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=868350.0, ans=0.125 2024-08-11 02:24:10,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=868450.0, ans=0.0 2024-08-11 02:24:17,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=868450.0, ans=0.95 2024-08-11 02:24:27,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=868550.0, ans=0.125 2024-08-11 02:24:30,619 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 02:24:32,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-08-11 02:24:35,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14400, loss[loss=0.1059, beats_loss=0.01146, ecapa_loss=0.0002353, whisper_loss=0.09207, over 21506.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01163, ecapa_loss=0.0002135, whisper_loss=0.09372, over 3940858.57 frames. ], batch size: 89, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:24:43,814 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 02:24:55,309 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 02:25:03,456 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 02:25:05,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868850.0, ans=0.1 2024-08-11 02:25:06,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=868850.0, ans=0.125 2024-08-11 02:25:17,542 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 02:25:21,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.691e+01 3.158e+01 3.511e+01 8.025e+01, threshold=6.317e+01, percent-clipped=1.0 2024-08-11 02:25:35,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=869050.0, ans=0.2 2024-08-11 02:25:40,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 6, batch 14450, loss[loss=0.123, beats_loss=0.01004, ecapa_loss=0.0002069, whisper_loss=0.1109, over 17760.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01151, ecapa_loss=0.000216, whisper_loss=0.09455, over 3921792.77 frames. ], batch size: 69, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:25:41,162 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-11 02:25:45,919 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 02:25:46,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=12.0 2024-08-11 02:25:47,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=869150.0, ans=10.0 2024-08-11 02:25:53,375 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 02:26:02,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-11 02:26:18,867 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.972e-01 2024-08-11 02:26:23,559 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 02:26:23,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=869450.0, ans=0.1 2024-08-11 02:26:26,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=869450.0, ans=0.2 2024-08-11 02:26:26,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-08-11 02:27:16,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 0, loss[loss=0.1094, beats_loss=0.01108, ecapa_loss=0.0002208, whisper_loss=0.09612, over 21864.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01108, ecapa_loss=0.0002208, whisper_loss=0.09612, over 21864.00 frames. ], batch size: 87, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:27:16,281 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 02:28:00,278 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006864, whisper_loss=0.2518, over 922467.00 frames. 2024-08-11 02:28:18,623 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on SV_voxceleb1: loss=0.00579, beats_loss=0, ecapa_loss=0.000579, whisper_loss=0, over 939242.00 frames. 2024-08-11 02:29:01,124 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5559, 1.7428, 1.6949, 1.3590], device='cuda:3') 2024-08-11 02:30:27,744 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on AT_audioset: loss=0.02579, beats_loss=0.02579, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 02:30:27,747 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 02:30:27,928 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 02:30:52,686 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 02:31:08,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=869690.0, ans=0.0 2024-08-11 02:31:33,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=869790.0, ans=0.125 2024-08-11 02:32:23,685 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 02:32:35,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.976e+01 3.314e+01 3.996e+01 6.220e+01, threshold=6.628e+01, percent-clipped=0.0 2024-08-11 02:33:11,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 50, loss[loss=0.1045, beats_loss=0.01375, ecapa_loss=0.000143, whisper_loss=0.0893, over 22944.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01126, ecapa_loss=0.0002112, whisper_loss=0.09062, over 884004.66 frames. ], batch size: 88, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:35:16,078 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-11 02:36:09,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=870490.0, ans=0.0 2024-08-11 02:36:18,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 100, loss[loss=0.07843, beats_loss=0.01284, ecapa_loss=0.0002531, whisper_loss=0.06305, over 17693.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01102, ecapa_loss=0.000216, whisper_loss=0.09307, over 1552580.57 frames. ], batch size: 74, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:37:24,910 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 02:37:25,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=870690.0, ans=0.125 2024-08-11 02:37:34,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=870790.0, ans=0.0 2024-08-11 02:37:39,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870790.0, ans=0.1 2024-08-11 02:38:29,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-08-11 02:38:32,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-11 02:38:32,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+01 3.124e+01 3.380e+01 3.805e+01 6.032e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-11 02:38:49,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 150, loss[loss=0.08893, beats_loss=0.01406, ecapa_loss=0.0001775, whisper_loss=0.07309, over 22175.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01112, ecapa_loss=0.0002112, whisper_loss=0.09287, over 2066239.42 frames. ], batch size: 89, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:39:02,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=871090.0, ans=0.125 2024-08-11 02:39:07,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=871190.0, ans=0.0 2024-08-11 02:39:07,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871190.0, ans=0.1 2024-08-11 02:39:27,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=871290.0, ans=0.0 2024-08-11 02:39:40,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=871290.0, ans=15.0 2024-08-11 02:39:42,249 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-11 02:40:02,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=871490.0, ans=0.05 2024-08-11 02:40:08,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=871490.0, ans=0.0 2024-08-11 02:40:08,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=871490.0, ans=0.125 2024-08-11 02:40:12,101 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.042e+00 2024-08-11 02:40:16,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 200, loss[loss=0.153, beats_loss=0.00701, ecapa_loss=0.0002261, whisper_loss=0.1437, over 17869.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01109, ecapa_loss=0.0002109, whisper_loss=0.0937, over 2442842.81 frames. ], batch size: 66, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:40:16,406 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 02:40:40,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=871690.0, ans=0.2 2024-08-11 02:40:59,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=871790.0, ans=0.125 2024-08-11 02:41:05,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=871890.0, ans=0.2 2024-08-11 02:41:06,129 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 02:41:21,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.791e+01 3.109e+01 3.398e+01 1.022e+02, threshold=6.218e+01, percent-clipped=1.0 2024-08-11 02:41:22,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871990.0, ans=0.1 2024-08-11 02:41:27,858 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 02:41:35,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 250, loss[loss=0.1052, beats_loss=0.01358, ecapa_loss=0.0002074, whisper_loss=0.08957, over 21342.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0112, ecapa_loss=0.0002108, whisper_loss=0.09319, over 2761614.74 frames. ], batch size: 85, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:41:52,193 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 02:41:53,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-11 02:42:08,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=872290.0, ans=0.125 2024-08-11 02:42:12,373 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 29 from LS+wenet, 6 from Vox, 24 fro AS 2024-08-11 02:42:16,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=872290.0, ans=0.125 2024-08-11 02:42:20,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=872290.0, ans=0.0 2024-08-11 02:42:20,730 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.112e-03 2024-08-11 02:42:57,954 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 300, loss[loss=0.08692, beats_loss=0.0134, ecapa_loss=0.0001695, whisper_loss=0.07182, over 19729.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01125, ecapa_loss=0.0002108, whisper_loss=0.09299, over 2994844.74 frames. ], batch size: 76, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:43:15,323 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 02:43:31,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-11 02:43:37,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=872790.0, ans=0.0 2024-08-11 02:43:38,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=872790.0, ans=0.0 2024-08-11 02:44:02,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.643e+01 2.910e+01 3.334e+01 5.693e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-11 02:44:04,020 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 02:44:15,855 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 350, loss[loss=0.1064, beats_loss=0.01138, ecapa_loss=0.0002007, whisper_loss=0.09305, over 17289.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01127, ecapa_loss=0.0002084, whisper_loss=0.09348, over 3204963.57 frames. ], batch size: 68, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:44:16,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=873090.0, ans=0.125 2024-08-11 02:44:16,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-11 02:44:19,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-11 02:44:25,872 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 02:44:27,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873090.0, ans=0.1 2024-08-11 02:44:31,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=873190.0, ans=0.125 2024-08-11 02:44:40,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=15.0 2024-08-11 02:44:58,364 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 02:45:01,425 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 02:45:04,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=873390.0, ans=0.125 2024-08-11 02:45:10,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=873390.0, ans=0.125 2024-08-11 02:45:15,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=873390.0, ans=0.1 2024-08-11 02:45:15,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=873390.0, ans=0.0 2024-08-11 02:45:18,490 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 02:45:19,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2024-08-11 02:45:21,158 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 02:45:32,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 400, loss[loss=0.108, beats_loss=0.01106, ecapa_loss=0.0001548, whisper_loss=0.09536, over 16277.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01127, ecapa_loss=0.0002086, whisper_loss=0.09282, over 3326874.86 frames. ], batch size: 60, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:45:44,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=873590.0, ans=0.125 2024-08-11 02:45:47,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=12.0 2024-08-11 02:45:55,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=873690.0, ans=0.125 2024-08-11 02:45:59,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=873690.0, ans=0.125 2024-08-11 02:46:00,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873690.0, ans=0.1 2024-08-11 02:46:02,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=873790.0, ans=0.125 2024-08-11 02:46:04,709 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 02:46:23,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=873890.0, ans=0.2 2024-08-11 02:46:34,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873990.0, ans=0.1 2024-08-11 02:46:35,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.580e+01 2.895e+01 3.398e+01 1.445e+02, threshold=5.790e+01, percent-clipped=1.0 2024-08-11 02:46:39,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=873990.0, ans=0.125 2024-08-11 02:46:48,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 450, loss[loss=0.09851, beats_loss=0.0122, ecapa_loss=0.0002039, whisper_loss=0.08426, over 22831.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01146, ecapa_loss=0.0002062, whisper_loss=0.09146, over 3440616.94 frames. ], batch size: 89, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:46:59,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=874090.0, ans=0.05 2024-08-11 02:47:05,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=874190.0, ans=0.0 2024-08-11 02:47:12,238 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 02:47:13,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2024-08-11 02:47:27,882 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 02:47:36,153 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 02:47:39,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=874390.0, ans=0.125 2024-08-11 02:47:49,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874490.0, ans=0.0 2024-08-11 02:47:51,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874490.0, ans=0.1 2024-08-11 02:47:52,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=874490.0, ans=0.0 2024-08-11 02:47:54,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=12.0 2024-08-11 02:47:57,711 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.332e-02 2024-08-11 02:48:02,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 500, loss[loss=0.1115, beats_loss=0.009976, ecapa_loss=0.0002029, whisper_loss=0.09952, over 18765.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01146, ecapa_loss=0.0002061, whisper_loss=0.09131, over 3516925.79 frames. ], batch size: 76, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:48:06,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2024-08-11 02:48:07,318 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 02:48:17,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=874690.0, ans=0.1 2024-08-11 02:48:30,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=874790.0, ans=0.0 2024-08-11 02:48:40,024 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 02:48:40,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.53 vs. limit=10.0 2024-08-11 02:48:56,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=874990.0, ans=0.125 2024-08-11 02:48:58,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.783e+01 3.369e+01 3.762e+01 6.753e+01, threshold=6.739e+01, percent-clipped=3.0 2024-08-11 02:49:01,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=874990.0, ans=0.0 2024-08-11 02:49:04,032 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:49:10,329 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 550, loss[loss=0.1043, beats_loss=0.007639, ecapa_loss=0.0002031, whisper_loss=0.09464, over 18792.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01149, ecapa_loss=0.000207, whisper_loss=0.09127, over 3586120.06 frames. ], batch size: 71, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:49:18,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=875090.0, ans=0.125 2024-08-11 02:49:20,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=875090.0, ans=0.2 2024-08-11 02:49:26,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=875190.0, ans=0.2 2024-08-11 02:49:39,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2024-08-11 02:49:44,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875290.0, ans=0.125 2024-08-11 02:49:55,102 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 02:49:58,935 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 02:50:10,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-11 02:50:11,487 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 02:50:14,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=875590.0, ans=0.125 2024-08-11 02:50:15,332 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 600, loss[loss=0.1024, beats_loss=0.008695, ecapa_loss=0.0002531, whisper_loss=0.09115, over 16158.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01143, ecapa_loss=0.0002063, whisper_loss=0.09242, over 3655052.43 frames. ], batch size: 67, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:50:16,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.05 vs. limit=22.5 2024-08-11 02:50:28,950 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.212e-01 2024-08-11 02:50:37,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-11 02:50:40,707 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 02:50:47,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-11 02:50:50,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=875790.0, ans=22.5 2024-08-11 02:51:01,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=875890.0, ans=0.125 2024-08-11 02:51:05,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=875890.0, ans=0.0 2024-08-11 02:51:09,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.703e+01 3.008e+01 3.347e+01 4.794e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 02:51:13,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=875990.0, ans=0.2 2024-08-11 02:51:20,951 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 650, loss[loss=0.1144, beats_loss=0.01017, ecapa_loss=0.0002063, whisper_loss=0.1022, over 16366.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01141, ecapa_loss=0.0002072, whisper_loss=0.09254, over 3697201.07 frames. ], batch size: 64, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:51:21,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876090.0, ans=0.125 2024-08-11 02:51:24,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=876090.0, ans=0.125 2024-08-11 02:51:56,166 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 02:52:12,000 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 02:52:14,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=876490.0, ans=0.1 2024-08-11 02:52:26,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 700, loss[loss=0.09087, beats_loss=0.01396, ecapa_loss=0.0002034, whisper_loss=0.07488, over 21438.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01138, ecapa_loss=0.0002069, whisper_loss=0.09287, over 3749795.60 frames. ], batch size: 90, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:52:29,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2024-08-11 02:52:50,165 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 02:53:01,808 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 02:53:06,847 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 29 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 02:53:13,196 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 02:53:13,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876890.0, ans=0.1 2024-08-11 02:53:19,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.856e+01 3.234e+01 3.790e+01 5.945e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-11 02:53:31,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 750, loss[loss=0.1137, beats_loss=0.01225, ecapa_loss=0.0001762, whisper_loss=0.09964, over 23576.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01138, ecapa_loss=0.0002056, whisper_loss=0.09273, over 3761291.09 frames. ], batch size: 90, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:53:44,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=877190.0, ans=0.0 2024-08-11 02:53:51,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=877190.0, ans=0.0 2024-08-11 02:54:04,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877290.0, ans=0.125 2024-08-11 02:54:04,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2024-08-11 02:54:05,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.55 vs. limit=22.5 2024-08-11 02:54:15,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=877390.0, ans=0.09899494936611666 2024-08-11 02:54:26,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=877490.0, ans=0.05 2024-08-11 02:54:36,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 800, loss[loss=0.1084, beats_loss=0.01045, ecapa_loss=0.0001943, whisper_loss=0.096, over 19175.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01139, ecapa_loss=0.0002048, whisper_loss=0.09285, over 3776903.39 frames. ], batch size: 74, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:54:36,541 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 02:54:39,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=877590.0, ans=0.125 2024-08-11 02:54:52,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=877690.0, ans=0.2 2024-08-11 02:55:03,608 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 02:55:03,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877790.0, ans=0.1 2024-08-11 02:55:04,712 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 02:55:16,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=877890.0, ans=0.2 2024-08-11 02:55:25,833 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 02:55:28,462 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 02:55:29,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.644e+01 2.972e+01 3.441e+01 7.984e+01, threshold=5.944e+01, percent-clipped=1.0 2024-08-11 02:55:30,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=877990.0, ans=0.0 2024-08-11 02:55:32,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=877990.0, ans=0.125 2024-08-11 02:55:33,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=877990.0, ans=0.2 2024-08-11 02:55:35,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=877990.0, ans=0.2 2024-08-11 02:55:41,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 850, loss[loss=0.1296, beats_loss=0.01049, ecapa_loss=0.0002122, whisper_loss=0.117, over 19210.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01146, ecapa_loss=0.0002027, whisper_loss=0.0919, over 3771395.86 frames. ], batch size: 72, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:55:41,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=878090.0, ans=0.0 2024-08-11 02:55:44,039 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 02:55:57,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=878190.0, ans=0.0 2024-08-11 02:56:06,046 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 02:56:10,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=878290.0, ans=0.0 2024-08-11 02:56:15,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=878290.0, ans=0.02 2024-08-11 02:56:16,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=22.5 2024-08-11 02:56:17,166 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 34 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 02:56:27,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=12.0 2024-08-11 02:56:35,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878490.0, ans=0.1 2024-08-11 02:56:36,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=878490.0, ans=0.125 2024-08-11 02:56:38,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=878490.0, ans=0.125 2024-08-11 02:56:41,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=878490.0, ans=0.0 2024-08-11 02:56:45,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 900, loss[loss=0.106, beats_loss=0.01365, ecapa_loss=0.0001644, whisper_loss=0.09066, over 22797.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01136, ecapa_loss=0.0002028, whisper_loss=0.09271, over 3776891.70 frames. ], batch size: 89, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:56:53,647 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:57:03,582 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 02:57:08,552 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-11 02:57:11,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=878790.0, ans=0.09899494936611666 2024-08-11 02:57:27,921 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 02:57:38,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.614e+01 2.988e+01 3.449e+01 5.810e+01, threshold=5.976e+01, percent-clipped=0.0 2024-08-11 02:57:40,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=878990.0, ans=0.125 2024-08-11 02:57:42,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2024-08-11 02:57:50,378 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 02:57:51,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 950, loss[loss=0.1226, beats_loss=0.01029, ecapa_loss=0.0002166, whisper_loss=0.1101, over 16075.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01129, ecapa_loss=0.0002017, whisper_loss=0.09287, over 3776460.41 frames. ], batch size: 65, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:58:20,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=879290.0, ans=0.0 2024-08-11 02:58:23,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=879290.0, ans=0.125 2024-08-11 02:59:00,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1000, loss[loss=0.1174, beats_loss=0.009801, ecapa_loss=0.0001872, whisper_loss=0.1057, over 19468.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01134, ecapa_loss=0.000203, whisper_loss=0.09225, over 3796010.68 frames. ], batch size: 73, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:59:12,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=879590.0, ans=0.125 2024-08-11 02:59:14,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-08-11 02:59:40,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=879790.0, ans=0.0 2024-08-11 02:59:45,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=879890.0, ans=0.125 2024-08-11 02:59:46,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=879890.0, ans=0.0 2024-08-11 02:59:49,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=879890.0, ans=0.125 2024-08-11 02:59:51,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-08-11 03:00:01,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.796e+01 3.092e+01 3.418e+01 4.355e+01, threshold=6.184e+01, percent-clipped=0.0 2024-08-11 03:00:02,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-11 03:00:07,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=879990.0, ans=0.0 2024-08-11 03:00:13,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1050, loss[loss=0.1033, beats_loss=0.01044, ecapa_loss=0.0001847, whisper_loss=0.09099, over 17342.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01136, ecapa_loss=0.000202, whisper_loss=0.09275, over 3780456.62 frames. ], batch size: 67, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:00:14,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=880090.0, ans=0.125 2024-08-11 03:00:24,275 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 03:00:50,452 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 27 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-11 03:01:03,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=880390.0, ans=0.2 2024-08-11 03:01:08,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=880390.0, ans=0.125 2024-08-11 03:01:12,134 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 03:01:13,958 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 03:01:15,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=880490.0, ans=0.125 2024-08-11 03:01:27,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1100, loss[loss=0.08726, beats_loss=0.01134, ecapa_loss=0.0001829, whisper_loss=0.07409, over 16556.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01123, ecapa_loss=0.0002008, whisper_loss=0.09384, over 3785029.60 frames. ], batch size: 65, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:01:42,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=880690.0, ans=0.125 2024-08-11 03:01:50,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880690.0, ans=0.1 2024-08-11 03:01:54,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=880690.0, ans=0.2 2024-08-11 03:01:56,501 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 03:02:20,195 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 03:02:24,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=880890.0, ans=0.0 2024-08-11 03:02:27,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.648e+01 3.166e+01 3.461e+01 5.758e+01, threshold=6.333e+01, percent-clipped=0.0 2024-08-11 03:02:40,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1150, loss[loss=0.1112, beats_loss=0.00929, ecapa_loss=0.0002043, whisper_loss=0.09989, over 19136.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01121, ecapa_loss=0.0002024, whisper_loss=0.09404, over 3773096.87 frames. ], batch size: 75, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:02:44,120 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 03:02:52,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881090.0, ans=0.125 2024-08-11 03:03:00,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-11 03:03:01,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=881190.0, ans=0.0 2024-08-11 03:03:11,277 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 15 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 03:03:13,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=881290.0, ans=0.05 2024-08-11 03:03:24,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=881390.0, ans=0.2 2024-08-11 03:03:39,390 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 03:03:45,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=881490.0, ans=0.125 2024-08-11 03:03:52,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1200, loss[loss=0.06923, beats_loss=0.01458, ecapa_loss=0.0001502, whisper_loss=0.05315, over 13469.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01126, ecapa_loss=0.0002026, whisper_loss=0.09301, over 3767176.93 frames. ], batch size: 54, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:04:15,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-08-11 03:04:18,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=881690.0, ans=0.0 2024-08-11 03:04:43,695 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 03:04:51,199 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 03:04:52,466 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.507e+01 2.887e+01 3.348e+01 4.586e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 03:04:52,593 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 03:04:53,851 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 03:05:05,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1250, loss[loss=0.07552, beats_loss=0.01448, ecapa_loss=0.0001925, whisper_loss=0.05911, over 20638.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01147, ecapa_loss=0.000203, whisper_loss=0.09193, over 3815631.15 frames. ], batch size: 86, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:05:08,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-08-11 03:05:26,697 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 03:05:29,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=882190.0, ans=0.0 2024-08-11 03:05:43,525 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 03:05:51,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-08-11 03:06:04,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=882490.0, ans=0.0 2024-08-11 03:06:07,036 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 03:06:08,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=882490.0, ans=0.125 2024-08-11 03:06:20,279 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1300, loss[loss=0.1016, beats_loss=0.01026, ecapa_loss=0.0002377, whisper_loss=0.08901, over 22043.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01151, ecapa_loss=0.0002022, whisper_loss=0.09183, over 3808867.50 frames. ], batch size: 89, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:06:30,567 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 03:06:32,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=882590.0, ans=0.125 2024-08-11 03:06:33,846 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 03:06:38,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882690.0, ans=0.125 2024-08-11 03:06:42,317 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 40 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 03:06:47,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=882690.0, ans=0.1 2024-08-11 03:06:58,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=882790.0, ans=0.125 2024-08-11 03:07:03,609 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 03:07:11,121 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-11 03:07:12,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=882890.0, ans=0.2 2024-08-11 03:07:20,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.642e+01 3.016e+01 3.566e+01 8.330e+01, threshold=6.031e+01, percent-clipped=1.0 2024-08-11 03:07:22,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=882990.0, ans=0.125 2024-08-11 03:07:33,441 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-11 03:07:34,483 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1350, loss[loss=0.1171, beats_loss=0.006616, ecapa_loss=0.0002304, whisper_loss=0.1082, over 15042.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01147, ecapa_loss=0.0001998, whisper_loss=0.09174, over 3796613.84 frames. ], batch size: 55, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:07:41,165 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:08:09,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=883290.0, ans=0.0 2024-08-11 03:08:24,839 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:08:35,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=883490.0, ans=0.07 2024-08-11 03:08:36,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2024-08-11 03:08:47,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=883590.0, ans=0.0 2024-08-11 03:08:48,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1400, loss[loss=0.103, beats_loss=0.01221, ecapa_loss=0.0002406, whisper_loss=0.08841, over 20666.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01148, ecapa_loss=0.0001996, whisper_loss=0.09157, over 3797476.43 frames. ], batch size: 90, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:08:56,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=883590.0, ans=0.05 2024-08-11 03:09:06,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=883690.0, ans=0.125 2024-08-11 03:09:07,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=883690.0, ans=0.125 2024-08-11 03:09:16,363 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.206e-02 2024-08-11 03:09:18,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883790.0, ans=0.1 2024-08-11 03:09:49,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.657e+01 3.071e+01 3.496e+01 6.029e+01, threshold=6.143e+01, percent-clipped=0.0 2024-08-11 03:10:37,354 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1450, loss[loss=0.08297, beats_loss=0.01059, ecapa_loss=0.0002517, whisper_loss=0.06986, over 18390.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01148, ecapa_loss=0.0001988, whisper_loss=0.09171, over 3791249.79 frames. ], batch size: 78, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:10:38,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-08-11 03:10:43,949 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-11 03:10:49,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=884090.0, ans=0.125 2024-08-11 03:10:56,132 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 03:10:56,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=884190.0, ans=0.125 2024-08-11 03:11:15,499 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 03:11:23,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=884390.0, ans=0.125 2024-08-11 03:11:30,450 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 03:11:32,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=884390.0, ans=0.0 2024-08-11 03:11:53,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1500, loss[loss=0.1081, beats_loss=0.01049, ecapa_loss=0.0002453, whisper_loss=0.09519, over 20352.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01152, ecapa_loss=0.0001988, whisper_loss=0.09132, over 3790389.39 frames. ], batch size: 85, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:12:03,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=884590.0, ans=0.025 2024-08-11 03:12:05,918 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 03:12:09,121 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 03:12:15,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884690.0, ans=0.1 2024-08-11 03:12:24,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=884790.0, ans=0.0 2024-08-11 03:12:26,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=884790.0, ans=0.125 2024-08-11 03:12:30,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884790.0, ans=0.1 2024-08-11 03:12:32,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=884790.0, ans=0.0 2024-08-11 03:12:35,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-08-11 03:12:48,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0 2024-08-11 03:12:53,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.731e+01 3.107e+01 3.593e+01 6.683e+01, threshold=6.214e+01, percent-clipped=1.0 2024-08-11 03:12:54,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=884990.0, ans=0.0 2024-08-11 03:12:56,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=884990.0, ans=0.125 2024-08-11 03:13:04,108 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 03:13:07,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1550, loss[loss=0.07762, beats_loss=0.01397, ecapa_loss=0.0002204, whisper_loss=0.06144, over 20842.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01144, ecapa_loss=0.0001995, whisper_loss=0.09207, over 3804042.73 frames. ], batch size: 89, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:13:15,516 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 03:13:20,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=885090.0, ans=0.0 2024-08-11 03:13:24,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=885190.0, ans=0.125 2024-08-11 03:13:34,866 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.071e-02 2024-08-11 03:13:54,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=885390.0, ans=0.0 2024-08-11 03:14:14,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=885490.0, ans=0.125 2024-08-11 03:14:21,454 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1600, loss[loss=0.1151, beats_loss=0.01142, ecapa_loss=0.0001686, whisper_loss=0.102, over 17375.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0114, ecapa_loss=0.0001989, whisper_loss=0.09292, over 3797563.94 frames. ], batch size: 63, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:14:38,136 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 03:14:40,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.77 vs. limit=22.5 2024-08-11 03:14:54,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885790.0, ans=0.1 2024-08-11 03:15:12,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=885890.0, ans=0.125 2024-08-11 03:15:21,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.608e+01 2.973e+01 3.361e+01 6.559e+01, threshold=5.946e+01, percent-clipped=1.0 2024-08-11 03:15:23,290 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 03:15:33,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=886090.0, ans=0.125 2024-08-11 03:15:34,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1650, loss[loss=0.1095, beats_loss=0.01061, ecapa_loss=0.0001863, whisper_loss=0.09707, over 22655.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01133, ecapa_loss=0.0001994, whisper_loss=0.09317, over 3798838.20 frames. ], batch size: 90, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:15:49,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=886190.0, ans=0.125 2024-08-11 03:15:59,501 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 03:16:07,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.11 vs. limit=22.5 2024-08-11 03:16:10,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-11 03:16:32,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=886490.0, ans=0.125 2024-08-11 03:16:32,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=886490.0, ans=0.125 2024-08-11 03:16:33,136 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 03:16:44,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1700, loss[loss=0.1348, beats_loss=0.009297, ecapa_loss=0.0002115, whisper_loss=0.1234, over 15258.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01129, ecapa_loss=0.0001993, whisper_loss=0.09341, over 3799116.29 frames. ], batch size: 59, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:17:03,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=886690.0, ans=0.2 2024-08-11 03:17:15,547 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 03:17:18,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=886790.0, ans=0.125 2024-08-11 03:17:21,229 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 03:17:26,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=15.0 2024-08-11 03:17:35,656 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 03:17:42,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.697e+01 3.081e+01 3.373e+01 4.997e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 03:17:49,856 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 03:17:55,119 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1750, loss[loss=0.1295, beats_loss=0.007289, ecapa_loss=0.0002645, whisper_loss=0.1195, over 17683.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01135, ecapa_loss=0.0001974, whisper_loss=0.09382, over 3813979.57 frames. ], batch size: 72, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:18:04,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-08-11 03:18:08,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887190.0, ans=0.1 2024-08-11 03:18:10,715 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 03:18:14,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=887190.0, ans=0.2 2024-08-11 03:18:25,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=887290.0, ans=0.0 2024-08-11 03:18:26,412 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 03:18:32,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=887290.0, ans=0.125 2024-08-11 03:18:40,869 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 03:18:47,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.16 vs. limit=6.0 2024-08-11 03:18:52,821 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 03:18:55,403 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 03:18:58,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=887490.0, ans=0.1 2024-08-11 03:19:03,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1800, loss[loss=0.1216, beats_loss=0.009832, ecapa_loss=0.000196, whisper_loss=0.1098, over 17997.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01127, ecapa_loss=0.0001986, whisper_loss=0.09391, over 3788885.39 frames. ], batch size: 69, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:19:05,847 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 03:19:13,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=887590.0, ans=0.125 2024-08-11 03:19:24,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-11 03:19:39,228 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 03:19:49,346 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 03:19:53,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=887890.0, ans=0.125 2024-08-11 03:19:53,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=887890.0, ans=0.125 2024-08-11 03:19:55,287 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.150e-02 2024-08-11 03:20:00,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.601e+01 2.973e+01 3.471e+01 4.949e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-11 03:20:01,973 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 03:20:03,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=887990.0, ans=0.0 2024-08-11 03:20:13,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1850, loss[loss=0.114, beats_loss=0.01054, ecapa_loss=0.0002299, whisper_loss=0.1012, over 22663.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01129, ecapa_loss=0.0001996, whisper_loss=0.09406, over 3809830.95 frames. ], batch size: 93, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:20:16,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=888090.0, ans=0.125 2024-08-11 03:20:21,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=888090.0, ans=0.125 2024-08-11 03:20:40,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=888290.0, ans=0.125 2024-08-11 03:21:05,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=888390.0, ans=0.125 2024-08-11 03:21:09,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=888490.0, ans=0.2 2024-08-11 03:21:15,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=888490.0, ans=0.125 2024-08-11 03:21:22,140 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1900, loss[loss=0.1211, beats_loss=0.01172, ecapa_loss=0.0001957, whisper_loss=0.1074, over 16755.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01134, ecapa_loss=0.0001992, whisper_loss=0.09425, over 3839142.73 frames. ], batch size: 66, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:21:31,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=888590.0, ans=0.125 2024-08-11 03:21:34,559 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 03:21:41,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=888690.0, ans=0.1 2024-08-11 03:21:41,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=888690.0, ans=0.125 2024-08-11 03:21:52,937 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 03:21:58,555 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:22:09,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2024-08-11 03:22:16,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.590e+01 3.002e+01 3.327e+01 6.064e+01, threshold=6.004e+01, percent-clipped=1.0 2024-08-11 03:22:24,529 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 03:22:30,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 1950, loss[loss=0.09705, beats_loss=0.01467, ecapa_loss=0.0001705, whisper_loss=0.08067, over 23071.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01135, ecapa_loss=0.0002036, whisper_loss=0.09359, over 3836329.98 frames. ], batch size: 91, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:22:31,822 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 03:22:40,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=889090.0, ans=0.125 2024-08-11 03:22:45,076 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-11 03:23:13,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=889390.0, ans=0.09899494936611666 2024-08-11 03:23:37,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=889590.0, ans=0.0 2024-08-11 03:23:38,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2000, loss[loss=0.1279, beats_loss=0.009533, ecapa_loss=0.0002432, whisper_loss=0.1159, over 22704.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01126, ecapa_loss=0.0002053, whisper_loss=0.09383, over 3840785.02 frames. ], batch size: 89, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:23:41,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=889590.0, ans=0.125 2024-08-11 03:23:41,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=889590.0, ans=0.125 2024-08-11 03:23:43,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=889590.0, ans=0.125 2024-08-11 03:23:51,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=889690.0, ans=0.0 2024-08-11 03:23:54,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=889690.0, ans=0.0 2024-08-11 03:23:59,851 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:24:06,389 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 03:24:13,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=889790.0, ans=0.125 2024-08-11 03:24:20,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-11 03:24:34,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.753e+01 3.127e+01 3.595e+01 5.672e+01, threshold=6.254e+01, percent-clipped=0.0 2024-08-11 03:24:36,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=889990.0, ans=0.2 2024-08-11 03:24:47,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2050, loss[loss=0.09565, beats_loss=0.01118, ecapa_loss=0.0002058, whisper_loss=0.08241, over 19045.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01127, ecapa_loss=0.0002068, whisper_loss=0.09331, over 3838546.10 frames. ], batch size: 77, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:25:00,269 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 03:25:01,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-11 03:25:01,767 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 03:25:11,747 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 38 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 03:25:30,339 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 03:25:39,021 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 03:26:00,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2100, loss[loss=0.1402, beats_loss=0.008422, ecapa_loss=0.0002444, whisper_loss=0.1293, over 22494.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01128, ecapa_loss=0.000207, whisper_loss=0.09328, over 3838318.69 frames. ], batch size: 88, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:26:08,850 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-11 03:26:15,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-11 03:26:21,504 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 03:26:31,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=890790.0, ans=0.2 2024-08-11 03:26:33,866 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 03:26:54,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=890890.0, ans=0.09899494936611666 2024-08-11 03:27:01,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.655e+01 3.007e+01 3.449e+01 4.820e+01, threshold=6.014e+01, percent-clipped=0.0 2024-08-11 03:27:04,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=890990.0, ans=0.125 2024-08-11 03:27:04,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890990.0, ans=0.1 2024-08-11 03:27:14,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2150, loss[loss=0.09976, beats_loss=0.01164, ecapa_loss=0.0002564, whisper_loss=0.08556, over 17586.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0114, ecapa_loss=0.0002064, whisper_loss=0.09267, over 3824655.00 frames. ], batch size: 71, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:27:14,444 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 03:27:28,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=891190.0, ans=0.2 2024-08-11 03:27:48,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891290.0, ans=0.1 2024-08-11 03:27:48,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=891290.0, ans=0.125 2024-08-11 03:27:59,459 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 03:28:08,053 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 03:28:11,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891490.0, ans=0.125 2024-08-11 03:28:16,586 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-11 03:28:20,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=891490.0, ans=0.2 2024-08-11 03:28:21,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=891490.0, ans=0.125 2024-08-11 03:28:26,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2200, loss[loss=0.1187, beats_loss=0.01221, ecapa_loss=0.0001899, whisper_loss=0.1046, over 22729.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01147, ecapa_loss=0.0002051, whisper_loss=0.09305, over 3836521.56 frames. ], batch size: 87, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:28:31,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=891590.0, ans=0.0 2024-08-11 03:28:36,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=891590.0, ans=0.125 2024-08-11 03:28:40,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891690.0, ans=0.1 2024-08-11 03:28:45,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=891690.0, ans=0.125 2024-08-11 03:29:01,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=891790.0, ans=0.125 2024-08-11 03:29:06,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=891790.0, ans=0.1 2024-08-11 03:29:09,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=891890.0, ans=0.0 2024-08-11 03:29:11,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=891890.0, ans=0.125 2024-08-11 03:29:14,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=891890.0, ans=0.0 2024-08-11 03:29:18,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=891890.0, ans=0.2 2024-08-11 03:29:23,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-11 03:29:27,787 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.671e+01 3.021e+01 3.496e+01 5.518e+01, threshold=6.042e+01, percent-clipped=0.0 2024-08-11 03:29:29,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891990.0, ans=0.1 2024-08-11 03:29:40,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2250, loss[loss=0.1133, beats_loss=0.01217, ecapa_loss=0.0002133, whisper_loss=0.09897, over 16394.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01151, ecapa_loss=0.0002068, whisper_loss=0.0935, over 3842608.91 frames. ], batch size: 63, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:29:48,586 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 03:30:23,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-11 03:30:36,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-08-11 03:30:42,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=892490.0, ans=0.125 2024-08-11 03:30:45,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=892490.0, ans=0.0 2024-08-11 03:30:48,198 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 03:30:52,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2300, loss[loss=0.1086, beats_loss=0.01024, ecapa_loss=0.0001622, whisper_loss=0.09674, over 21128.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01149, ecapa_loss=0.0002072, whisper_loss=0.09369, over 3836523.47 frames. ], batch size: 78, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:30:58,330 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 03:31:02,850 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-11 03:31:07,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=892690.0, ans=0.125 2024-08-11 03:31:20,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=892690.0, ans=0.07 2024-08-11 03:31:25,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=892790.0, ans=0.0 2024-08-11 03:31:47,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=892890.0, ans=0.125 2024-08-11 03:31:49,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=892890.0, ans=0.05 2024-08-11 03:31:54,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=892990.0, ans=0.125 2024-08-11 03:31:54,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=892990.0, ans=0.0 2024-08-11 03:31:54,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.862e+01 3.270e+01 3.564e+01 5.997e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-11 03:32:01,191 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 03:32:09,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2350, loss[loss=0.07171, beats_loss=0.01336, ecapa_loss=0.0002187, whisper_loss=0.05617, over 16345.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01137, ecapa_loss=0.0002086, whisper_loss=0.094, over 3793990.64 frames. ], batch size: 67, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:32:11,146 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 03:32:12,959 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 03:32:40,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=893290.0, ans=0.1 2024-08-11 03:32:45,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=893290.0, ans=0.125 2024-08-11 03:32:53,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=893290.0, ans=0.125 2024-08-11 03:32:56,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=893390.0, ans=0.0 2024-08-11 03:33:10,709 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 03:33:12,587 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 03:33:22,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=893490.0, ans=0.0 2024-08-11 03:33:24,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.74 vs. limit=10.0 2024-08-11 03:33:25,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2400, loss[loss=0.1151, beats_loss=0.01113, ecapa_loss=0.0002393, whisper_loss=0.1015, over 20872.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01131, ecapa_loss=0.0002113, whisper_loss=0.09405, over 3798886.83 frames. ], batch size: 88, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:33:35,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-11 03:33:41,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=893690.0, ans=0.125 2024-08-11 03:33:48,343 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 03:34:08,211 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 03:34:09,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-08-11 03:34:17,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=15.0 2024-08-11 03:34:28,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.589e+01 2.936e+01 3.311e+01 5.160e+01, threshold=5.871e+01, percent-clipped=0.0 2024-08-11 03:34:31,120 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.271e+00 2024-08-11 03:34:44,114 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2450, loss[loss=0.1251, beats_loss=0.01075, ecapa_loss=0.0001933, whisper_loss=0.1124, over 23825.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01129, ecapa_loss=0.0002114, whisper_loss=0.09409, over 3822634.95 frames. ], batch size: 91, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:34:59,346 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 03:35:08,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=894190.0, ans=0.0 2024-08-11 03:35:09,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-08-11 03:35:20,376 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 03:35:20,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=894290.0, ans=0.2 2024-08-11 03:35:23,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=894290.0, ans=0.125 2024-08-11 03:35:23,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=894290.0, ans=0.0 2024-08-11 03:35:26,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=894290.0, ans=0.125 2024-08-11 03:35:28,905 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 03:35:38,069 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 03:35:41,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=894390.0, ans=0.2 2024-08-11 03:35:58,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2500, loss[loss=0.1265, beats_loss=0.009921, ecapa_loss=0.0001736, whisper_loss=0.1148, over 22255.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0002114, whisper_loss=0.09348, over 3806854.12 frames. ], batch size: 83, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:35:58,649 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 29 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-11 03:36:03,350 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 03:36:11,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=894590.0, ans=0.125 2024-08-11 03:36:28,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.85 vs. limit=10.0 2024-08-11 03:36:29,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=894790.0, ans=0.0 2024-08-11 03:36:29,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=894790.0, ans=0.125 2024-08-11 03:36:30,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-08-11 03:36:31,650 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 03:36:57,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=894890.0, ans=0.1 2024-08-11 03:37:03,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.710e+01 3.039e+01 3.423e+01 5.787e+01, threshold=6.079e+01, percent-clipped=0.0 2024-08-11 03:37:08,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=894990.0, ans=0.95 2024-08-11 03:37:13,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=894990.0, ans=0.125 2024-08-11 03:37:16,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2550, loss[loss=0.1079, beats_loss=0.0104, ecapa_loss=0.0002497, whisper_loss=0.09504, over 22118.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01128, ecapa_loss=0.0002114, whisper_loss=0.09405, over 3828890.75 frames. ], batch size: 94, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:37:19,740 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 03:37:34,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2024-08-11 03:37:35,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=895190.0, ans=0.95 2024-08-11 03:37:56,280 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 03:38:05,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=895390.0, ans=15.0 2024-08-11 03:38:19,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895490.0, ans=0.1 2024-08-11 03:38:29,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=895490.0, ans=0.125 2024-08-11 03:38:33,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2600, loss[loss=0.111, beats_loss=0.0112, ecapa_loss=0.0002064, whisper_loss=0.0977, over 21001.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01134, ecapa_loss=0.0002109, whisper_loss=0.0937, over 3831390.77 frames. ], batch size: 86, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:38:34,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-08-11 03:38:35,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=895590.0, ans=0.125 2024-08-11 03:39:01,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2024-08-11 03:39:22,855 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 03:39:26,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=895890.0, ans=0.0 2024-08-11 03:39:27,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=895890.0, ans=0.0 2024-08-11 03:39:30,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=895890.0, ans=0.125 2024-08-11 03:39:36,702 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.572e+01 2.896e+01 3.197e+01 4.923e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 03:39:37,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=895990.0, ans=0.125 2024-08-11 03:39:38,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2024-08-11 03:39:50,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2650, loss[loss=0.1066, beats_loss=0.01118, ecapa_loss=0.0002343, whisper_loss=0.09311, over 21230.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0002097, whisper_loss=0.09361, over 3868375.81 frames. ], batch size: 89, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:39:57,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=896090.0, ans=0.0 2024-08-11 03:40:00,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=896090.0, ans=0.07 2024-08-11 03:40:14,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=896190.0, ans=0.125 2024-08-11 03:40:22,432 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 03:40:27,393 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 03:40:33,344 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 03:40:47,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=896390.0, ans=0.0 2024-08-11 03:41:02,224 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 03:41:04,854 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 03:41:05,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2700, loss[loss=0.1085, beats_loss=0.01342, ecapa_loss=0.0002114, whisper_loss=0.09293, over 22228.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0002118, whisper_loss=0.09361, over 3875663.40 frames. ], batch size: 91, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:41:15,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-11 03:41:16,265 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 03:41:24,441 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 03:41:47,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=896790.0, ans=0.125 2024-08-11 03:41:49,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=896790.0, ans=0.125 2024-08-11 03:41:51,612 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-11 03:41:55,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896890.0, ans=0.1 2024-08-11 03:42:03,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=896890.0, ans=0.07 2024-08-11 03:42:07,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=896890.0, ans=0.2 2024-08-11 03:42:11,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.641e+01 2.955e+01 3.583e+01 6.037e+01, threshold=5.910e+01, percent-clipped=1.0 2024-08-11 03:42:25,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2750, loss[loss=0.09626, beats_loss=0.01001, ecapa_loss=0.0001845, whisper_loss=0.08441, over 15243.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01133, ecapa_loss=0.0002112, whisper_loss=0.09401, over 3880577.34 frames. ], batch size: 58, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:42:36,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-11 03:42:45,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=897190.0, ans=0.025 2024-08-11 03:42:54,904 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 03:43:11,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=897390.0, ans=0.125 2024-08-11 03:43:20,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=897390.0, ans=0.125 2024-08-11 03:43:23,493 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 03:43:25,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=897390.0, ans=0.09899494936611666 2024-08-11 03:43:40,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2024-08-11 03:43:43,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2800, loss[loss=0.1086, beats_loss=0.009, ecapa_loss=0.0002676, whisper_loss=0.09688, over 18344.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01141, ecapa_loss=0.00021, whisper_loss=0.09402, over 3895604.46 frames. ], batch size: 76, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:43:54,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=897590.0, ans=0.125 2024-08-11 03:43:55,832 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 03:44:30,961 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 03:44:48,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.710e+01 2.962e+01 3.650e+01 5.339e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 03:45:02,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2850, loss[loss=0.08409, beats_loss=0.01069, ecapa_loss=0.0002749, whisper_loss=0.07065, over 14340.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01148, ecapa_loss=0.0002098, whisper_loss=0.09387, over 3896833.61 frames. ], batch size: 62, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:45:27,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=898190.0, ans=0.5 2024-08-11 03:45:30,879 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 03:45:36,720 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 27 from LS+wenet, 14 from Vox, 14 fro AS 2024-08-11 03:45:41,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2024-08-11 03:45:54,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2024-08-11 03:46:07,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=898490.0, ans=0.035 2024-08-11 03:46:12,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=898490.0, ans=0.125 2024-08-11 03:46:15,295 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 03:46:15,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=898490.0, ans=0.0 2024-08-11 03:46:25,182 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2900, loss[loss=0.1271, beats_loss=0.008775, ecapa_loss=0.0002542, whisper_loss=0.1158, over 15510.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01141, ecapa_loss=0.0002107, whisper_loss=0.09443, over 3888717.37 frames. ], batch size: 62, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:46:26,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-11 03:47:13,402 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 03:47:25,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=898890.0, ans=0.0 2024-08-11 03:47:30,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2024-08-11 03:47:30,827 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.659e+01 2.989e+01 3.721e+01 7.203e+01, threshold=5.978e+01, percent-clipped=1.0 2024-08-11 03:47:33,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=898990.0, ans=0.0 2024-08-11 03:47:39,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-11 03:47:44,931 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 03:47:45,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 2950, loss[loss=0.1245, beats_loss=0.01049, ecapa_loss=0.0001893, whisper_loss=0.1121, over 15010.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01144, ecapa_loss=0.0002108, whisper_loss=0.09424, over 3887013.59 frames. ], batch size: 55, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:47:47,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=899090.0, ans=0.09899494936611666 2024-08-11 03:47:51,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=12.0 2024-08-11 03:48:03,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899190.0, ans=0.0 2024-08-11 03:48:05,915 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 03:48:11,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=899190.0, ans=0.125 2024-08-11 03:48:19,062 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 12 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 03:48:19,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=899290.0, ans=0.125 2024-08-11 03:48:30,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=899290.0, ans=0.125 2024-08-11 03:48:39,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-11 03:48:43,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=899390.0, ans=0.0 2024-08-11 03:49:08,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3000, loss[loss=0.1061, beats_loss=0.009495, ecapa_loss=0.0001828, whisper_loss=0.09477, over 22538.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01142, ecapa_loss=0.0002113, whisper_loss=0.09386, over 3871855.28 frames. ], batch size: 88, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:49:08,471 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 03:49:48,997 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006718, whisper_loss=0.2519, over 922467.00 frames. 2024-08-11 03:50:07,493 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on SV_voxceleb1: loss=0.005617, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0, over 939242.00 frames. 2024-08-11 03:51:26,727 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.8762, 2.1676, 2.2037, 2.0034], device='cuda:3') 2024-08-11 03:52:03,542 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on AT_audioset: loss=0.02572, beats_loss=0.02572, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 03:52:03,546 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 03:52:03,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=899590.0, ans=0.05 2024-08-11 03:52:24,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=899690.0, ans=0.125 2024-08-11 03:52:34,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=899790.0, ans=0.125 2024-08-11 03:52:36,279 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 03:52:38,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=899790.0, ans=0.125 2024-08-11 03:52:56,177 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 03:53:15,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.673e+01 3.038e+01 3.538e+01 6.757e+01, threshold=6.077e+01, percent-clipped=1.0 2024-08-11 03:53:18,887 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 03:53:30,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3050, loss[loss=0.09948, beats_loss=0.01192, ecapa_loss=0.000214, whisper_loss=0.08542, over 16959.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01154, ecapa_loss=0.0002097, whisper_loss=0.09361, over 3890316.10 frames. ], batch size: 70, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:53:40,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=900090.0, ans=0.0 2024-08-11 03:53:42,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=900090.0, ans=0.0 2024-08-11 03:53:55,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=900190.0, ans=0.0 2024-08-11 03:54:05,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=900190.0, ans=0.035 2024-08-11 03:54:05,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=900190.0, ans=0.2 2024-08-11 03:54:10,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=900290.0, ans=0.125 2024-08-11 03:54:13,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=900290.0, ans=0.2 2024-08-11 03:54:22,733 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:54:24,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=900390.0, ans=10.0 2024-08-11 03:54:26,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=900390.0, ans=0.0 2024-08-11 03:54:42,414 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-11 03:54:44,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=900490.0, ans=0.125 2024-08-11 03:54:57,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3100, loss[loss=0.1032, beats_loss=0.01403, ecapa_loss=0.0001757, whisper_loss=0.08746, over 19496.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01154, ecapa_loss=0.0002094, whisper_loss=0.09398, over 3906545.86 frames. ], batch size: 79, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:55:19,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=900690.0, ans=0.0 2024-08-11 03:55:23,181 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 03:55:23,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=900690.0, ans=0.1 2024-08-11 03:55:36,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=900790.0, ans=0.125 2024-08-11 03:55:36,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=15.0 2024-08-11 03:55:39,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-08-11 03:55:45,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=900890.0, ans=0.0 2024-08-11 03:55:50,749 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 03:56:02,424 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 03:56:05,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.642e+01 2.994e+01 3.477e+01 5.395e+01, threshold=5.988e+01, percent-clipped=0.0 2024-08-11 03:56:16,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=900990.0, ans=0.125 2024-08-11 03:56:19,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=901090.0, ans=0.0 2024-08-11 03:56:20,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3150, loss[loss=0.0983, beats_loss=0.01056, ecapa_loss=0.0002507, whisper_loss=0.08523, over 21572.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01151, ecapa_loss=0.0002089, whisper_loss=0.0938, over 3881977.10 frames. ], batch size: 90, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:56:25,573 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 03:56:42,874 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-11 03:56:50,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2024-08-11 03:57:07,042 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 03:57:10,598 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 03:57:17,347 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 03:57:36,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=901490.0, ans=0.2 2024-08-11 03:57:44,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3200, loss[loss=0.1157, beats_loss=0.009605, ecapa_loss=0.000231, whisper_loss=0.1038, over 19756.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01148, ecapa_loss=0.000212, whisper_loss=0.09475, over 3911566.44 frames. ], batch size: 78, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:57:44,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=901590.0, ans=0.2 2024-08-11 03:57:50,663 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 03:58:19,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-11 03:58:37,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=901890.0, ans=0.1 2024-08-11 03:58:51,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.680e+01 2.966e+01 3.598e+01 6.746e+01, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 03:58:56,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-11 03:58:58,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=901990.0, ans=0.125 2024-08-11 03:59:07,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3250, loss[loss=0.1192, beats_loss=0.01036, ecapa_loss=0.0002245, whisper_loss=0.1066, over 18434.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01152, ecapa_loss=0.0002125, whisper_loss=0.09466, over 3899706.46 frames. ], batch size: 73, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 03:59:28,754 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 03:59:38,299 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 03:59:40,081 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 03:59:50,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=902290.0, ans=0.09899494936611666 2024-08-11 04:00:03,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=902390.0, ans=0.125 2024-08-11 04:00:17,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=902490.0, ans=0.0 2024-08-11 04:00:25,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3300, loss[loss=0.1105, beats_loss=0.009191, ecapa_loss=0.0002503, whisper_loss=0.09885, over 17842.00 frames. ], tot_loss[loss=0.108, beats_loss=0.0115, ecapa_loss=0.0002123, whisper_loss=0.09437, over 3866593.91 frames. ], batch size: 74, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:00:34,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=902590.0, ans=0.0 2024-08-11 04:00:45,552 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.026e-03 2024-08-11 04:00:47,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2024-08-11 04:01:02,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=902790.0, ans=0.0 2024-08-11 04:01:08,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=902790.0, ans=0.5 2024-08-11 04:01:10,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-08-11 04:01:28,885 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 04:01:31,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2024-08-11 04:01:38,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.725e+01 3.245e+01 3.907e+01 7.359e+01, threshold=6.490e+01, percent-clipped=2.0 2024-08-11 04:01:52,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3350, loss[loss=0.1086, beats_loss=0.01149, ecapa_loss=0.0002205, whisper_loss=0.09491, over 21045.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0115, ecapa_loss=0.0002116, whisper_loss=0.09407, over 3875633.91 frames. ], batch size: 88, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:01:52,769 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 04:01:55,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2024-08-11 04:02:00,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-08-11 04:02:24,448 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 04:02:36,336 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.828e-01 2024-08-11 04:02:53,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=903390.0, ans=0.2 2024-08-11 04:02:56,877 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 04:02:57,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=903490.0, ans=0.05 2024-08-11 04:03:03,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=903490.0, ans=0.0 2024-08-11 04:03:13,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3400, loss[loss=0.1226, beats_loss=0.0116, ecapa_loss=0.0002172, whisper_loss=0.1088, over 22916.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0115, ecapa_loss=0.0002096, whisper_loss=0.09451, over 3903683.04 frames. ], batch size: 94, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:03:17,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=903590.0, ans=0.125 2024-08-11 04:03:19,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=903590.0, ans=0.125 2024-08-11 04:03:21,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-11 04:03:24,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=903590.0, ans=0.025 2024-08-11 04:03:32,715 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 04:03:45,056 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-11 04:03:57,757 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 04:04:07,388 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 04:04:08,597 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 04:04:18,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.715e+01 3.132e+01 3.599e+01 6.001e+01, threshold=6.265e+01, percent-clipped=0.0 2024-08-11 04:04:32,131 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3450, loss[loss=0.08959, beats_loss=0.0133, ecapa_loss=0.0001742, whisper_loss=0.07455, over 20855.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01154, ecapa_loss=0.000211, whisper_loss=0.0934, over 3888512.50 frames. ], batch size: 83, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:04:37,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-08-11 04:04:54,136 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 04:04:57,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-08-11 04:05:03,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904290.0, ans=0.1 2024-08-11 04:05:13,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904290.0, ans=0.1 2024-08-11 04:05:27,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=904390.0, ans=0.125 2024-08-11 04:05:35,877 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 04:05:37,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=904490.0, ans=0.125 2024-08-11 04:05:40,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=904490.0, ans=0.125 2024-08-11 04:05:42,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3500, loss[loss=0.1175, beats_loss=0.007658, ecapa_loss=0.0002399, whisper_loss=0.1075, over 18900.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01151, ecapa_loss=0.0002119, whisper_loss=0.09367, over 3918247.49 frames. ], batch size: 75, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:05:49,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=904590.0, ans=0.0 2024-08-11 04:05:54,282 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 04:05:55,468 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 04:05:59,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=904690.0, ans=0.0 2024-08-11 04:06:11,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=904790.0, ans=0.0 2024-08-11 04:06:17,914 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-11 04:06:18,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.97 vs. limit=10.0 2024-08-11 04:06:25,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=904890.0, ans=0.05 2024-08-11 04:06:30,307 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 14 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 04:06:30,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=904890.0, ans=0.125 2024-08-11 04:06:35,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.786e+01 3.047e+01 3.456e+01 6.070e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 04:06:47,016 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3550, loss[loss=0.07993, beats_loss=0.01138, ecapa_loss=0.0002221, whisper_loss=0.06633, over 15766.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01155, ecapa_loss=0.0002113, whisper_loss=0.09322, over 3935457.46 frames. ], batch size: 67, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:07:13,489 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 04:07:20,287 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 04:07:24,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=905290.0, ans=0.125 2024-08-11 04:07:53,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3600, loss[loss=0.1084, beats_loss=0.01146, ecapa_loss=0.0002145, whisper_loss=0.09477, over 22926.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01149, ecapa_loss=0.0002111, whisper_loss=0.09415, over 3920797.17 frames. ], batch size: 92, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:07:56,085 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 04:07:57,321 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 18 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-11 04:08:08,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=905690.0, ans=0.125 2024-08-11 04:08:18,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=905790.0, ans=0.125 2024-08-11 04:08:34,239 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 04:08:35,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=905890.0, ans=0.125 2024-08-11 04:08:47,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.715e+01 3.031e+01 3.500e+01 1.161e+02, threshold=6.062e+01, percent-clipped=1.0 2024-08-11 04:08:55,562 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 04:08:56,808 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 04:08:59,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3650, loss[loss=0.07881, beats_loss=0.01317, ecapa_loss=0.0002793, whisper_loss=0.06284, over 19908.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01147, ecapa_loss=0.0002116, whisper_loss=0.09386, over 3900587.57 frames. ], batch size: 88, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:09:08,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=906090.0, ans=0.125 2024-08-11 04:09:28,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2024-08-11 04:09:30,539 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 04:09:48,635 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 04:09:49,876 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 04:10:00,337 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 29 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 04:10:02,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-11 04:10:04,208 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3700, loss[loss=0.1044, beats_loss=0.01353, ecapa_loss=0.0001913, whisper_loss=0.08899, over 22348.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01151, ecapa_loss=0.0002095, whisper_loss=0.0939, over 3865695.64 frames. ], batch size: 91, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:10:11,051 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 04:10:16,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=906690.0, ans=0.2 2024-08-11 04:10:16,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=906690.0, ans=0.0 2024-08-11 04:10:18,082 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 04:10:26,956 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 04:10:33,852 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 04:10:41,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2024-08-11 04:10:44,620 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 04:10:47,057 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 04:10:58,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.695e+01 3.038e+01 3.419e+01 5.061e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-11 04:11:10,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3750, loss[loss=0.1052, beats_loss=0.01313, ecapa_loss=0.000145, whisper_loss=0.0906, over 20468.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01156, ecapa_loss=0.0002071, whisper_loss=0.09392, over 3873344.22 frames. ], batch size: 77, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:11:13,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-11 04:11:15,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=907090.0, ans=0.0 2024-08-11 04:11:22,683 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 04:11:24,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=907190.0, ans=0.125 2024-08-11 04:11:35,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=907290.0, ans=0.125 2024-08-11 04:11:42,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=907290.0, ans=0.125 2024-08-11 04:12:04,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=18.27 vs. limit=15.0 2024-08-11 04:12:14,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2024-08-11 04:12:16,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3800, loss[loss=0.104, beats_loss=0.01438, ecapa_loss=0.0001775, whisper_loss=0.08786, over 17647.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01154, ecapa_loss=0.0002078, whisper_loss=0.09353, over 3839419.44 frames. ], batch size: 71, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:12:18,967 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 04:12:25,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=907590.0, ans=0.0 2024-08-11 04:12:30,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=907690.0, ans=0.125 2024-08-11 04:12:32,841 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 04:12:33,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=907690.0, ans=0.0 2024-08-11 04:12:46,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=907790.0, ans=0.07 2024-08-11 04:12:48,318 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 04:12:56,354 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 04:13:03,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=907890.0, ans=0.125 2024-08-11 04:13:05,991 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 04:13:09,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.721e+01 2.981e+01 3.416e+01 8.567e+01, threshold=5.961e+01, percent-clipped=1.0 2024-08-11 04:13:22,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3850, loss[loss=0.1024, beats_loss=0.01287, ecapa_loss=0.0001723, whisper_loss=0.0878, over 19434.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01165, ecapa_loss=0.0002083, whisper_loss=0.09302, over 3865348.72 frames. ], batch size: 76, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:13:25,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=908090.0, ans=0.0 2024-08-11 04:14:31,721 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 24 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-11 04:14:32,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3900, loss[loss=0.1335, beats_loss=0.008708, ecapa_loss=0.0001825, whisper_loss=0.123, over 15702.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0115, ecapa_loss=0.0002097, whisper_loss=0.09446, over 3857656.56 frames. ], batch size: 56, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:14:34,363 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 17 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-11 04:14:58,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908690.0, ans=0.1 2024-08-11 04:14:59,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=908690.0, ans=0.2 2024-08-11 04:15:08,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=908790.0, ans=0.1 2024-08-11 04:15:10,680 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 04:15:26,341 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-11 04:15:32,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-11 04:15:32,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.684e+01 3.033e+01 3.679e+01 6.201e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 04:15:38,812 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 04:15:45,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-11 04:15:45,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 3950, loss[loss=0.099, beats_loss=0.01367, ecapa_loss=0.0001493, whisper_loss=0.08383, over 22957.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01145, ecapa_loss=0.0002113, whisper_loss=0.09406, over 3859328.80 frames. ], batch size: 90, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:15:51,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-11 04:15:58,775 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 04:16:00,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=909190.0, ans=0.1 2024-08-11 04:16:06,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=909190.0, ans=0.0 2024-08-11 04:16:13,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-11 04:16:14,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=909290.0, ans=0.0 2024-08-11 04:16:17,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=909290.0, ans=0.2 2024-08-11 04:16:17,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=909290.0, ans=0.0 2024-08-11 04:16:25,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=909290.0, ans=0.125 2024-08-11 04:16:30,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=909390.0, ans=0.2 2024-08-11 04:16:31,976 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 12 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 04:16:38,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=909390.0, ans=0.0 2024-08-11 04:16:41,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=909390.0, ans=0.125 2024-08-11 04:16:41,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-11 04:16:50,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=909490.0, ans=0.09899494936611666 2024-08-11 04:16:59,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4000, loss[loss=0.09261, beats_loss=0.01336, ecapa_loss=0.0002041, whisper_loss=0.07722, over 21076.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01146, ecapa_loss=0.000212, whisper_loss=0.09446, over 3869889.30 frames. ], batch size: 87, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:17:09,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=909590.0, ans=0.07 2024-08-11 04:17:40,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=909790.0, ans=0.1 2024-08-11 04:17:48,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=909890.0, ans=0.0 2024-08-11 04:18:00,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.789e+01 3.181e+01 3.971e+01 6.202e+01, threshold=6.363e+01, percent-clipped=1.0 2024-08-11 04:18:15,026 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4050, loss[loss=0.1096, beats_loss=0.009961, ecapa_loss=0.0002268, whisper_loss=0.09741, over 20665.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01131, ecapa_loss=0.0002128, whisper_loss=0.09501, over 3875138.67 frames. ], batch size: 84, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:18:16,837 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 04:18:20,885 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 04:18:25,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=910090.0, ans=0.2 2024-08-11 04:18:28,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.76 vs. limit=10.0 2024-08-11 04:18:32,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=910190.0, ans=0.125 2024-08-11 04:18:46,736 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 04:18:54,135 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 04:18:54,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=910290.0, ans=0.0 2024-08-11 04:18:54,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-11 04:19:02,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=910390.0, ans=0.0 2024-08-11 04:19:20,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=910490.0, ans=0.125 2024-08-11 04:19:30,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4100, loss[loss=0.1063, beats_loss=0.01169, ecapa_loss=0.0002538, whisper_loss=0.09204, over 18963.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01132, ecapa_loss=0.000213, whisper_loss=0.09511, over 3881802.04 frames. ], batch size: 79, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:19:30,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=910590.0, ans=0.125 2024-08-11 04:19:37,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=910590.0, ans=22.5 2024-08-11 04:20:11,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2024-08-11 04:20:16,691 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 04:20:22,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=910890.0, ans=0.0 2024-08-11 04:20:26,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=910890.0, ans=0.5 2024-08-11 04:20:31,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.728e+01 2.968e+01 3.426e+01 6.142e+01, threshold=5.935e+01, percent-clipped=0.0 2024-08-11 04:20:40,740 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 04:20:46,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4150, loss[loss=0.1063, beats_loss=0.01256, ecapa_loss=0.0001827, whisper_loss=0.09189, over 20928.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01139, ecapa_loss=0.0002127, whisper_loss=0.09534, over 3893943.61 frames. ], batch size: 81, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:20:48,613 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 04:20:50,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-11 04:21:03,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=911190.0, ans=0.2 2024-08-11 04:21:06,113 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 04:21:08,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.96 vs. limit=5.0 2024-08-11 04:21:23,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=911290.0, ans=0.0 2024-08-11 04:21:26,452 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-11 04:21:30,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-08-11 04:21:57,289 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 04:22:02,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4200, loss[loss=0.09852, beats_loss=0.01037, ecapa_loss=0.0002199, whisper_loss=0.08596, over 19933.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01137, ecapa_loss=0.0002124, whisper_loss=0.09559, over 3885508.31 frames. ], batch size: 79, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:22:07,014 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 04:22:23,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=911690.0, ans=0.125 2024-08-11 04:22:47,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911890.0, ans=0.1 2024-08-11 04:22:57,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=911890.0, ans=0.125 2024-08-11 04:23:01,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.712e+01 3.098e+01 3.462e+01 7.406e+01, threshold=6.196e+01, percent-clipped=1.0 2024-08-11 04:23:11,450 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 04:23:13,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.80 vs. limit=22.5 2024-08-11 04:23:13,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4250, loss[loss=0.1264, beats_loss=0.009096, ecapa_loss=0.0002235, whisper_loss=0.115, over 22976.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01144, ecapa_loss=0.0002115, whisper_loss=0.09489, over 3884755.18 frames. ], batch size: 91, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:23:18,498 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 04:23:22,956 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 04:23:23,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=912090.0, ans=0.125 2024-08-11 04:23:31,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912190.0, ans=0.1 2024-08-11 04:23:35,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=912190.0, ans=0.125 2024-08-11 04:23:42,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=912290.0, ans=0.07 2024-08-11 04:24:02,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2024-08-11 04:24:05,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=912390.0, ans=0.04949747468305833 2024-08-11 04:24:08,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2024-08-11 04:24:08,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=912490.0, ans=6.0 2024-08-11 04:24:09,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=912490.0, ans=0.05 2024-08-11 04:24:15,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=912490.0, ans=0.125 2024-08-11 04:24:18,638 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 04:24:22,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4300, loss[loss=0.1253, beats_loss=0.008274, ecapa_loss=0.0002119, whisper_loss=0.1149, over 20008.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01141, ecapa_loss=0.0002117, whisper_loss=0.09468, over 3872300.94 frames. ], batch size: 76, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:24:42,297 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 04:24:54,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=912790.0, ans=0.2 2024-08-11 04:25:12,123 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 04:25:16,155 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.652e+01 2.970e+01 3.355e+01 6.636e+01, threshold=5.939e+01, percent-clipped=1.0 2024-08-11 04:25:27,230 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 04:25:28,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4350, loss[loss=0.11, beats_loss=0.01287, ecapa_loss=0.0002166, whisper_loss=0.09495, over 21707.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01143, ecapa_loss=0.0002118, whisper_loss=0.09414, over 3852661.51 frames. ], batch size: 89, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:25:48,218 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 04:25:52,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=913190.0, ans=0.0 2024-08-11 04:25:58,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=913290.0, ans=0.1 2024-08-11 04:25:59,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2024-08-11 04:26:02,071 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 04:26:04,887 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 04:26:05,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=913290.0, ans=0.125 2024-08-11 04:26:30,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=913490.0, ans=0.125 2024-08-11 04:26:32,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-11 04:26:34,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4400, loss[loss=0.1205, beats_loss=0.01007, ecapa_loss=0.0001824, whisper_loss=0.1086, over 22894.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01142, ecapa_loss=0.0002105, whisper_loss=0.09447, over 3876708.54 frames. ], batch size: 90, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:26:39,397 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 04:26:42,175 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 04:26:44,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=913590.0, ans=0.125 2024-08-11 04:26:56,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=913690.0, ans=0.125 2024-08-11 04:27:04,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=913790.0, ans=0.0 2024-08-11 04:27:15,154 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 04:27:19,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=913890.0, ans=0.2 2024-08-11 04:27:19,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=913890.0, ans=0.1 2024-08-11 04:27:20,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=913890.0, ans=10.0 2024-08-11 04:27:20,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-11 04:27:25,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=913990.0, ans=0.0 2024-08-11 04:27:27,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.550e+01 2.848e+01 3.646e+01 5.843e+01, threshold=5.697e+01, percent-clipped=0.0 2024-08-11 04:27:36,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.55 vs. limit=15.0 2024-08-11 04:27:39,579 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4450, loss[loss=0.1071, beats_loss=0.01192, ecapa_loss=0.0002175, whisper_loss=0.09296, over 22632.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01148, ecapa_loss=0.0002101, whisper_loss=0.09355, over 3867336.02 frames. ], batch size: 94, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:27:40,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=914090.0, ans=0.0 2024-08-11 04:27:49,263 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 04:28:00,190 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 04:28:36,505 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 04:28:51,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4500, loss[loss=0.09889, beats_loss=0.01325, ecapa_loss=0.0002145, whisper_loss=0.0835, over 20276.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01153, ecapa_loss=0.0002083, whisper_loss=0.09369, over 3887400.95 frames. ], batch size: 79, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:28:51,880 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 04:28:54,854 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 04:29:19,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=914790.0, ans=0.0 2024-08-11 04:29:30,049 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 04:29:45,720 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 04:29:47,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-11 04:29:47,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.660e+01 3.113e+01 3.675e+01 6.136e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 04:29:49,645 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 04:29:57,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=914990.0, ans=0.0 2024-08-11 04:29:59,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4550, loss[loss=0.1206, beats_loss=0.01143, ecapa_loss=0.0002248, whisper_loss=0.107, over 21390.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0115, ecapa_loss=0.0002081, whisper_loss=0.09398, over 3905428.47 frames. ], batch size: 88, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:30:18,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=915190.0, ans=0.0 2024-08-11 04:30:29,417 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:30:40,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-08-11 04:30:42,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-11 04:30:46,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=915390.0, ans=0.125 2024-08-11 04:30:50,265 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 04:30:50,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=915390.0, ans=0.2 2024-08-11 04:30:54,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915490.0, ans=0.1 2024-08-11 04:30:55,529 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 04:30:58,407 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 04:31:05,918 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4600, loss[loss=0.08625, beats_loss=0.01166, ecapa_loss=0.0002424, whisper_loss=0.07217, over 19144.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0115, ecapa_loss=0.0002065, whisper_loss=0.09386, over 3917970.98 frames. ], batch size: 78, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:31:10,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=915590.0, ans=0.2 2024-08-11 04:31:11,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915590.0, ans=0.1 2024-08-11 04:31:18,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=915690.0, ans=0.125 2024-08-11 04:31:24,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=915690.0, ans=0.125 2024-08-11 04:31:31,867 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 04:31:33,297 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 13 from Vox, 51 fro AS 2024-08-11 04:31:40,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=12.0 2024-08-11 04:31:45,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=915890.0, ans=0.0 2024-08-11 04:31:47,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-08-11 04:31:59,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.820e+01 3.108e+01 3.626e+01 5.972e+01, threshold=6.216e+01, percent-clipped=0.0 2024-08-11 04:32:06,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.33 vs. limit=15.0 2024-08-11 04:32:11,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4650, loss[loss=0.09638, beats_loss=0.01109, ecapa_loss=0.0002161, whisper_loss=0.08313, over 22301.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01146, ecapa_loss=0.0002084, whisper_loss=0.09348, over 3873469.51 frames. ], batch size: 90, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:32:15,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=916090.0, ans=0.125 2024-08-11 04:32:15,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=916090.0, ans=0.0 2024-08-11 04:32:19,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2024-08-11 04:32:20,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=916090.0, ans=0.125 2024-08-11 04:32:23,291 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 04:32:28,587 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 04:32:33,078 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 10 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 04:33:10,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=916490.0, ans=0.1 2024-08-11 04:33:13,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=916490.0, ans=0.125 2024-08-11 04:33:17,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4700, loss[loss=0.1097, beats_loss=0.01091, ecapa_loss=0.0001874, whisper_loss=0.09691, over 24136.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01153, ecapa_loss=0.0002083, whisper_loss=0.09319, over 3867760.38 frames. ], batch size: 93, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:33:23,288 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 04:33:27,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=916590.0, ans=0.125 2024-08-11 04:33:33,653 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 04:33:41,604 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 04:33:49,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=916790.0, ans=0.0 2024-08-11 04:33:57,656 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 04:34:01,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=916890.0, ans=0.2 2024-08-11 04:34:02,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=916890.0, ans=0.0 2024-08-11 04:34:12,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.755e+01 3.145e+01 3.501e+01 4.476e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 04:34:14,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=916990.0, ans=0.0 2024-08-11 04:34:20,314 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 04:34:23,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4750, loss[loss=0.103, beats_loss=0.009529, ecapa_loss=0.0003044, whisper_loss=0.09038, over 13153.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01159, ecapa_loss=0.000208, whisper_loss=0.09328, over 3897739.53 frames. ], batch size: 55, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:34:25,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=917090.0, ans=0.0 2024-08-11 04:34:27,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=917090.0, ans=0.2 2024-08-11 04:34:27,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-11 04:34:29,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=917090.0, ans=10.0 2024-08-11 04:34:35,389 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 04:34:54,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=917290.0, ans=0.125 2024-08-11 04:34:55,136 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 04:34:56,312 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-11 04:34:56,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=917290.0, ans=0.125 2024-08-11 04:35:28,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4800, loss[loss=0.1002, beats_loss=0.01094, ecapa_loss=0.0002634, whisper_loss=0.08664, over 13460.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01155, ecapa_loss=0.0002098, whisper_loss=0.09368, over 3921077.13 frames. ], batch size: 55, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:35:41,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-11 04:36:12,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-11 04:36:19,818 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 04:36:22,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.878e+01 3.268e+01 3.983e+01 7.610e+01, threshold=6.536e+01, percent-clipped=1.0 2024-08-11 04:36:25,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=917990.0, ans=0.07 2024-08-11 04:36:33,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2024-08-11 04:36:34,024 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4850, loss[loss=0.107, beats_loss=0.01316, ecapa_loss=0.0001816, whisper_loss=0.09206, over 22873.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01163, ecapa_loss=0.0002103, whisper_loss=0.09351, over 3929267.83 frames. ], batch size: 91, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:36:43,387 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 04:36:53,879 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 04:37:02,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=918290.0, ans=0.125 2024-08-11 04:37:03,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=918290.0, ans=0.125 2024-08-11 04:37:03,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=918290.0, ans=0.125 2024-08-11 04:37:04,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=918290.0, ans=0.125 2024-08-11 04:37:09,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=918290.0, ans=0.1 2024-08-11 04:37:12,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=918390.0, ans=0.125 2024-08-11 04:37:28,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=918490.0, ans=0.1 2024-08-11 04:37:29,670 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 04:37:31,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=918490.0, ans=0.125 2024-08-11 04:37:38,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=918590.0, ans=0.125 2024-08-11 04:37:38,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4900, loss[loss=0.1054, beats_loss=0.01383, ecapa_loss=0.0002115, whisper_loss=0.08943, over 21950.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0116, ecapa_loss=0.0002108, whisper_loss=0.09397, over 3926215.31 frames. ], batch size: 88, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:37:41,575 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 04:37:52,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=918690.0, ans=0.035 2024-08-11 04:37:53,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=918690.0, ans=0.04949747468305833 2024-08-11 04:37:53,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=15.0 2024-08-11 04:37:59,428 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 04:38:16,668 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 04:38:18,330 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-11 04:38:23,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=918890.0, ans=0.125 2024-08-11 04:38:26,868 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-11 04:38:31,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.611e+01 2.961e+01 3.443e+01 6.053e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 04:38:33,483 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 10 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 04:38:33,813 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.492e-01 2024-08-11 04:38:42,737 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 04:38:43,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 4950, loss[loss=0.1105, beats_loss=0.01225, ecapa_loss=0.0001967, whisper_loss=0.09631, over 21288.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01151, ecapa_loss=0.0002116, whisper_loss=0.09415, over 3887977.71 frames. ], batch size: 86, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:38:53,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=919090.0, ans=0.0 2024-08-11 04:38:57,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=919190.0, ans=0.0 2024-08-11 04:39:23,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=919390.0, ans=0.2 2024-08-11 04:39:25,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=919390.0, ans=0.2 2024-08-11 04:39:41,073 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:39:49,332 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-11 04:39:53,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5000, loss[loss=0.1014, beats_loss=0.008804, ecapa_loss=0.0002208, whisper_loss=0.09039, over 17344.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01149, ecapa_loss=0.0002112, whisper_loss=0.09406, over 3881768.39 frames. ], batch size: 66, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:39:59,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=919590.0, ans=0.125 2024-08-11 04:40:06,634 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 04:40:16,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=919690.0, ans=0.125 2024-08-11 04:40:27,801 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 04:40:32,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=919790.0, ans=0.0 2024-08-11 04:40:34,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=919890.0, ans=0.1 2024-08-11 04:40:46,739 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 17 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 04:40:47,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=919890.0, ans=0.125 2024-08-11 04:40:54,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.724e+01 2.983e+01 3.443e+01 5.585e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 04:40:57,579 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 04:41:06,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5050, loss[loss=0.1157, beats_loss=0.01124, ecapa_loss=0.0001777, whisper_loss=0.1026, over 17384.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01149, ecapa_loss=0.0002104, whisper_loss=0.09436, over 3876770.97 frames. ], batch size: 66, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:41:09,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-08-11 04:41:22,282 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 04:41:27,127 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 39 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 04:41:27,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-11 04:41:31,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=920190.0, ans=0.035 2024-08-11 04:41:41,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=920290.0, ans=0.0 2024-08-11 04:41:49,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=920290.0, ans=0.0 2024-08-11 04:42:13,623 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 04:42:18,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5100, loss[loss=0.08382, beats_loss=0.0145, ecapa_loss=0.0001798, whisper_loss=0.06752, over 14796.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01154, ecapa_loss=0.0002088, whisper_loss=0.09418, over 3880621.22 frames. ], batch size: 61, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:42:22,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-11 04:42:29,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=920590.0, ans=0.0 2024-08-11 04:42:33,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=920690.0, ans=6.0 2024-08-11 04:42:33,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-08-11 04:42:34,538 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 04:42:41,355 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 04:43:17,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.667e+01 3.175e+01 3.575e+01 5.874e+01, threshold=6.350e+01, percent-clipped=0.0 2024-08-11 04:43:22,174 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 04:43:31,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5150, loss[loss=0.1052, beats_loss=0.01157, ecapa_loss=0.0002296, whisper_loss=0.09131, over 20034.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01151, ecapa_loss=0.0002091, whisper_loss=0.09399, over 3887206.31 frames. ], batch size: 82, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:43:45,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=921190.0, ans=0.0 2024-08-11 04:43:52,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2024-08-11 04:43:52,660 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 30 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 04:43:55,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=921190.0, ans=0.125 2024-08-11 04:44:10,143 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 04:44:12,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=921290.0, ans=0.125 2024-08-11 04:44:14,854 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:44:16,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=921390.0, ans=0.025 2024-08-11 04:44:31,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=921490.0, ans=0.0 2024-08-11 04:44:39,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=921490.0, ans=0.125 2024-08-11 04:44:44,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=921590.0, ans=0.1 2024-08-11 04:44:45,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.32 vs. limit=10.0 2024-08-11 04:44:45,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5200, loss[loss=0.1203, beats_loss=0.00924, ecapa_loss=0.0002103, whisper_loss=0.1089, over 14897.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01146, ecapa_loss=0.0002081, whisper_loss=0.09427, over 3898938.82 frames. ], batch size: 57, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:44:48,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-11 04:44:49,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-11 04:44:53,734 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 04:44:58,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=921590.0, ans=0.125 2024-08-11 04:45:04,642 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 04:45:06,299 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 04:45:08,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-11 04:45:11,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=921690.0, ans=0.2 2024-08-11 04:45:12,420 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 04:45:19,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-11 04:45:32,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=921890.0, ans=15.0 2024-08-11 04:45:42,322 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 04:45:43,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=921890.0, ans=0.04949747468305833 2024-08-11 04:45:47,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.623e+01 2.992e+01 3.438e+01 5.362e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-11 04:45:50,974 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 04:45:54,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=921990.0, ans=0.0 2024-08-11 04:46:00,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5250, loss[loss=0.1072, beats_loss=0.01197, ecapa_loss=0.0001917, whisper_loss=0.09327, over 23619.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01138, ecapa_loss=0.000208, whisper_loss=0.09462, over 3911752.84 frames. ], batch size: 95, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:46:01,516 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 04:46:20,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=922190.0, ans=0.0 2024-08-11 04:46:31,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=922290.0, ans=0.0 2024-08-11 04:46:43,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=922390.0, ans=0.125 2024-08-11 04:46:44,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=922390.0, ans=0.0 2024-08-11 04:46:51,164 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 04:46:55,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2024-08-11 04:47:13,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5300, loss[loss=0.1005, beats_loss=0.01144, ecapa_loss=0.0002208, whisper_loss=0.08681, over 21718.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01137, ecapa_loss=0.0002096, whisper_loss=0.09433, over 3905657.19 frames. ], batch size: 90, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:47:16,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2024-08-11 04:47:18,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=12.0 2024-08-11 04:47:59,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=922890.0, ans=0.1 2024-08-11 04:48:07,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=922890.0, ans=0.2 2024-08-11 04:48:11,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.730e+01 3.116e+01 3.540e+01 5.766e+01, threshold=6.232e+01, percent-clipped=0.0 2024-08-11 04:48:24,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5350, loss[loss=0.0927, beats_loss=0.01133, ecapa_loss=0.0002596, whisper_loss=0.07877, over 18039.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01132, ecapa_loss=0.0002095, whisper_loss=0.09442, over 3887596.46 frames. ], batch size: 76, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:48:26,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=923090.0, ans=0.0 2024-08-11 04:48:30,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=923090.0, ans=0.125 2024-08-11 04:48:34,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-11 04:48:36,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923090.0, ans=0.1 2024-08-11 04:48:46,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=923190.0, ans=0.0 2024-08-11 04:48:51,064 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 04:48:53,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=923290.0, ans=0.2 2024-08-11 04:48:55,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=923290.0, ans=0.025 2024-08-11 04:49:09,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923390.0, ans=0.1 2024-08-11 04:49:22,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=923490.0, ans=0.0 2024-08-11 04:49:36,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5400, loss[loss=0.1295, beats_loss=0.01085, ecapa_loss=0.0002405, whisper_loss=0.1162, over 21160.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01128, ecapa_loss=0.000209, whisper_loss=0.09471, over 3898231.46 frames. ], batch size: 86, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:49:40,766 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 04:49:43,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=923590.0, ans=0.125 2024-08-11 04:49:59,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-11 04:50:10,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.27 vs. limit=22.5 2024-08-11 04:50:11,425 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 04:50:26,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=923890.0, ans=0.125 2024-08-11 04:50:31,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.648e+01 2.918e+01 3.540e+01 6.193e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 04:50:43,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5450, loss[loss=0.1006, beats_loss=0.01344, ecapa_loss=0.0001542, whisper_loss=0.08559, over 14718.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0114, ecapa_loss=0.0002084, whisper_loss=0.09399, over 3893467.85 frames. ], batch size: 58, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:51:26,013 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 04:51:30,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=924390.0, ans=0.1 2024-08-11 04:51:30,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-08-11 04:51:35,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=924390.0, ans=0.125 2024-08-11 04:51:36,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=924490.0, ans=0.2 2024-08-11 04:51:44,260 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 04:51:50,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5500, loss[loss=0.1052, beats_loss=0.01221, ecapa_loss=0.0001624, whisper_loss=0.09138, over 22439.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01139, ecapa_loss=0.0002073, whisper_loss=0.09403, over 3874634.34 frames. ], batch size: 89, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:51:53,451 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 04:51:54,594 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 04:52:01,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=924590.0, ans=0.0 2024-08-11 04:52:04,630 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.204e+00 2024-08-11 04:52:26,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=924790.0, ans=0.0 2024-08-11 04:52:44,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.644e+01 3.103e+01 3.543e+01 6.260e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 04:52:51,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=924990.0, ans=0.125 2024-08-11 04:52:56,051 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5550, loss[loss=0.119, beats_loss=0.01111, ecapa_loss=0.0002046, whisper_loss=0.1058, over 23620.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01137, ecapa_loss=0.0002105, whisper_loss=0.0937, over 3859639.80 frames. ], batch size: 92, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:53:12,481 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 04:53:16,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=925190.0, ans=0.1 2024-08-11 04:53:25,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=925290.0, ans=0.09899494936611666 2024-08-11 04:53:29,225 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 04:53:34,203 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 04:53:43,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=925390.0, ans=0.0 2024-08-11 04:53:48,424 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-11 04:53:56,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=925490.0, ans=0.0 2024-08-11 04:53:57,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=925490.0, ans=0.125 2024-08-11 04:54:01,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5600, loss[loss=0.1053, beats_loss=0.01277, ecapa_loss=0.0002034, whisper_loss=0.09048, over 21712.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01137, ecapa_loss=0.0002112, whisper_loss=0.09379, over 3866390.56 frames. ], batch size: 88, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:54:07,121 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 04:54:07,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=925590.0, ans=0.125 2024-08-11 04:54:17,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=925690.0, ans=0.125 2024-08-11 04:54:20,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-08-11 04:54:23,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=925690.0, ans=10.0 2024-08-11 04:54:26,614 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 04:54:28,677 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.977e+00 2024-08-11 04:54:37,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=925790.0, ans=0.0 2024-08-11 04:54:54,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.711e+01 3.123e+01 3.568e+01 9.227e+01, threshold=6.245e+01, percent-clipped=1.0 2024-08-11 04:54:57,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=925990.0, ans=0.125 2024-08-11 04:55:05,868 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5650, loss[loss=0.118, beats_loss=0.01122, ecapa_loss=0.0001838, whisper_loss=0.105, over 22527.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01143, ecapa_loss=0.0002101, whisper_loss=0.0939, over 3875046.15 frames. ], batch size: 90, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:55:07,466 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 04:55:13,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=926090.0, ans=0.2 2024-08-11 04:55:24,166 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 04:55:52,622 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 04:56:10,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2024-08-11 04:56:10,941 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5700, loss[loss=0.105, beats_loss=0.01033, ecapa_loss=0.0002046, whisper_loss=0.09263, over 22341.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01149, ecapa_loss=0.0002105, whisper_loss=0.09343, over 3891244.37 frames. ], batch size: 90, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:56:20,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=926590.0, ans=0.0 2024-08-11 04:56:21,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=926590.0, ans=0.125 2024-08-11 04:56:46,040 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 04:56:46,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=926790.0, ans=0.125 2024-08-11 04:56:46,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=926790.0, ans=0.0 2024-08-11 04:56:47,314 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 04:56:55,353 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 04:56:57,836 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 04:57:03,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.796e+01 3.057e+01 3.549e+01 5.833e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 04:57:11,797 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 04:57:15,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5750, loss[loss=0.1235, beats_loss=0.007595, ecapa_loss=0.0001918, whisper_loss=0.114, over 15561.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01152, ecapa_loss=0.0002102, whisper_loss=0.09329, over 3933669.98 frames. ], batch size: 57, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:57:21,035 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 11 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 04:57:28,819 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 04:57:36,695 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 04:57:38,200 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 04:57:56,836 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 04:58:08,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=927490.0, ans=0.2 2024-08-11 04:58:14,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-11 04:58:17,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=927490.0, ans=0.0 2024-08-11 04:58:21,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5800, loss[loss=0.1136, beats_loss=0.01125, ecapa_loss=0.0002398, whisper_loss=0.09993, over 20187.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01151, ecapa_loss=0.0002099, whisper_loss=0.09332, over 3899972.52 frames. ], batch size: 84, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:58:21,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=927590.0, ans=0.0 2024-08-11 04:58:23,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=927590.0, ans=0.05 2024-08-11 04:58:37,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=927690.0, ans=0.1 2024-08-11 04:58:39,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.64 vs. limit=5.0 2024-08-11 04:58:47,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=927790.0, ans=0.1 2024-08-11 04:58:48,528 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 04:59:12,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=927990.0, ans=0.025 2024-08-11 04:59:14,149 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.689e+01 2.933e+01 3.272e+01 5.873e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 04:59:18,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.98 vs. limit=12.0 2024-08-11 04:59:25,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=928090.0, ans=0.125 2024-08-11 04:59:25,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5850, loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0002152, whisper_loss=0.08928, over 20163.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01149, ecapa_loss=0.0002098, whisper_loss=0.09334, over 3907212.13 frames. ], batch size: 80, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:59:26,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-11 04:59:45,483 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 04:59:46,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=928190.0, ans=0.0 2024-08-11 04:59:50,473 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 04:59:56,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=928290.0, ans=0.1 2024-08-11 04:59:56,945 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 05:00:06,447 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 05:00:18,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=928490.0, ans=0.2 2024-08-11 05:00:20,828 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 05:00:29,586 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 05:00:30,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5900, loss[loss=0.09772, beats_loss=0.01057, ecapa_loss=0.000231, whisper_loss=0.08484, over 22209.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01153, ecapa_loss=0.0002113, whisper_loss=0.09258, over 3917637.94 frames. ], batch size: 91, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:00:36,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2024-08-11 05:00:44,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=928690.0, ans=0.125 2024-08-11 05:00:49,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=928690.0, ans=0.125 2024-08-11 05:01:01,066 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 05:01:03,875 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 05:01:04,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=928790.0, ans=0.125 2024-08-11 05:01:09,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=928890.0, ans=0.125 2024-08-11 05:01:24,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.592e+01 2.867e+01 3.350e+01 5.876e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-11 05:01:36,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 5950, loss[loss=0.09625, beats_loss=0.01411, ecapa_loss=0.0001921, whisper_loss=0.08022, over 17170.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01149, ecapa_loss=0.0002104, whisper_loss=0.09315, over 3898357.25 frames. ], batch size: 71, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:01:40,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=929090.0, ans=0.125 2024-08-11 05:01:43,834 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 05:02:02,410 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-11 05:02:02,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=929290.0, ans=0.0 2024-08-11 05:02:03,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=929290.0, ans=0.125 2024-08-11 05:02:15,671 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 05:02:22,379 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 05:02:36,471 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 05:02:41,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6000, loss[loss=0.09345, beats_loss=0.01211, ecapa_loss=0.0002184, whisper_loss=0.07916, over 16345.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01156, ecapa_loss=0.0002075, whisper_loss=0.09321, over 3941961.40 frames. ], batch size: 65, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:02:41,668 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 05:03:21,206 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on ASR_libri: loss=0.2594, beats_loss=0, ecapa_loss=0.0006753, whisper_loss=0.2527, over 922467.00 frames. 2024-08-11 05:03:38,364 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on SV_voxceleb1: loss=0.005594, beats_loss=0, ecapa_loss=0.0005594, whisper_loss=0, over 939242.00 frames. 2024-08-11 05:05:33,504 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 05:05:33,508 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 05:05:39,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-08-11 05:05:40,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=929590.0, ans=0.125 2024-08-11 05:06:19,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=12.0 2024-08-11 05:06:27,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.542e+01 2.913e+01 3.356e+01 5.863e+01, threshold=5.826e+01, percent-clipped=1.0 2024-08-11 05:06:33,730 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 05:06:38,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6050, loss[loss=0.09645, beats_loss=0.01213, ecapa_loss=0.0002624, whisper_loss=0.0817, over 19280.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01146, ecapa_loss=0.000207, whisper_loss=0.09361, over 3922380.42 frames. ], batch size: 83, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:06:48,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=930090.0, ans=0.0 2024-08-11 05:07:02,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=930190.0, ans=0.0 2024-08-11 05:07:08,865 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 05:07:15,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=930290.0, ans=0.1 2024-08-11 05:07:16,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=930390.0, ans=0.125 2024-08-11 05:07:37,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=930490.0, ans=0.0 2024-08-11 05:07:41,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=930490.0, ans=0.125 2024-08-11 05:07:43,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6100, loss[loss=0.1234, beats_loss=0.01229, ecapa_loss=0.0001508, whisper_loss=0.1096, over 18962.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01147, ecapa_loss=0.0002084, whisper_loss=0.0936, over 3894965.08 frames. ], batch size: 72, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:07:51,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=930590.0, ans=0.0 2024-08-11 05:07:52,930 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 05:07:58,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=930690.0, ans=0.125 2024-08-11 05:07:59,239 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 05:08:23,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-08-11 05:08:29,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=930890.0, ans=0.2 2024-08-11 05:08:37,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.611e+01 2.902e+01 3.349e+01 2.714e+02, threshold=5.803e+01, percent-clipped=1.0 2024-08-11 05:08:37,283 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 05:08:49,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6150, loss[loss=0.08801, beats_loss=0.01447, ecapa_loss=0.0002131, whisper_loss=0.07141, over 17490.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01152, ecapa_loss=0.0002097, whisper_loss=0.0927, over 3865222.71 frames. ], batch size: 71, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:09:06,566 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 05:09:28,845 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 05:09:46,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=931490.0, ans=0.125 2024-08-11 05:09:54,760 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6200, loss[loss=0.1004, beats_loss=0.009055, ecapa_loss=0.0002225, whisper_loss=0.08912, over 14799.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01149, ecapa_loss=0.0002075, whisper_loss=0.09318, over 3895792.59 frames. ], batch size: 57, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:10:23,473 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 05:10:30,012 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 05:10:46,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=931990.0, ans=0.025 2024-08-11 05:10:48,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 2.725e+01 3.050e+01 3.372e+01 5.411e+01, threshold=6.100e+01, percent-clipped=0.0 2024-08-11 05:10:58,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=931990.0, ans=0.125 2024-08-11 05:11:00,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6250, loss[loss=0.1164, beats_loss=0.01219, ecapa_loss=0.0002231, whisper_loss=0.102, over 20906.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01158, ecapa_loss=0.0002075, whisper_loss=0.09257, over 3906695.89 frames. ], batch size: 84, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:11:08,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=932090.0, ans=0.125 2024-08-11 05:11:27,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2024-08-11 05:11:27,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=932290.0, ans=0.125 2024-08-11 05:11:28,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=22.5 2024-08-11 05:11:31,739 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 05:11:35,797 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 05:11:38,542 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 05:12:00,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-08-11 05:12:05,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6300, loss[loss=0.1247, beats_loss=0.01112, ecapa_loss=0.0002198, whisper_loss=0.1114, over 22883.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01161, ecapa_loss=0.0002075, whisper_loss=0.09222, over 3876867.70 frames. ], batch size: 94, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:12:10,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932590.0, ans=0.1 2024-08-11 05:12:15,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=932590.0, ans=0.125 2024-08-11 05:12:31,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=932790.0, ans=0.125 2024-08-11 05:12:39,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2024-08-11 05:12:58,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=932990.0, ans=0.125 2024-08-11 05:12:59,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.671e+01 3.003e+01 3.406e+01 5.856e+01, threshold=6.007e+01, percent-clipped=0.0 2024-08-11 05:13:04,059 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 05:13:04,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=932990.0, ans=0.2 2024-08-11 05:13:07,890 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 22 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-11 05:13:11,571 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6350, loss[loss=0.1178, beats_loss=0.01037, ecapa_loss=0.000219, whisper_loss=0.1052, over 22979.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01161, ecapa_loss=0.0002088, whisper_loss=0.09288, over 3908627.41 frames. ], batch size: 92, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:13:43,546 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 05:13:58,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=933390.0, ans=0.125 2024-08-11 05:14:06,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=933490.0, ans=0.0 2024-08-11 05:14:14,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=933490.0, ans=0.2 2024-08-11 05:14:21,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6400, loss[loss=0.1065, beats_loss=0.01177, ecapa_loss=0.0002193, whisper_loss=0.09257, over 22057.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01166, ecapa_loss=0.0002083, whisper_loss=0.09279, over 3908687.51 frames. ], batch size: 88, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:14:31,072 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 05:14:43,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=933690.0, ans=0.0 2024-08-11 05:15:02,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=12.0 2024-08-11 05:15:08,254 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 05:15:17,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.766e+01 3.115e+01 3.539e+01 7.313e+01, threshold=6.229e+01, percent-clipped=3.0 2024-08-11 05:15:29,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2024-08-11 05:15:29,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6450, loss[loss=0.1152, beats_loss=0.009164, ecapa_loss=0.0002386, whisper_loss=0.1037, over 17327.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01156, ecapa_loss=0.0002097, whisper_loss=0.09346, over 3906818.67 frames. ], batch size: 69, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:15:43,203 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 05:16:01,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=934290.0, ans=0.125 2024-08-11 05:16:37,277 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 05:16:40,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=934490.0, ans=22.5 2024-08-11 05:16:42,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=934590.0, ans=0.125 2024-08-11 05:16:42,890 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6500, loss[loss=0.1156, beats_loss=0.01076, ecapa_loss=0.0002087, whisper_loss=0.1028, over 21018.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.0115, ecapa_loss=0.0002086, whisper_loss=0.09456, over 3938778.84 frames. ], batch size: 84, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:16:46,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2024-08-11 05:17:07,492 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 05:17:07,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=934690.0, ans=0.125 2024-08-11 05:17:17,148 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 05:17:37,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=934890.0, ans=0.0 2024-08-11 05:17:42,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.816e+01 3.248e+01 3.661e+01 5.361e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-11 05:17:51,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=934990.0, ans=0.025 2024-08-11 05:17:52,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=934990.0, ans=0.1 2024-08-11 05:17:55,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6550, loss[loss=0.1231, beats_loss=0.01057, ecapa_loss=0.000223, whisper_loss=0.1103, over 22632.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01147, ecapa_loss=0.0002093, whisper_loss=0.09485, over 3942351.25 frames. ], batch size: 90, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:18:17,295 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 05:18:23,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=12.0 2024-08-11 05:18:23,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2024-08-11 05:18:24,592 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 05:18:29,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=935290.0, ans=0.035 2024-08-11 05:18:29,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=935290.0, ans=0.0 2024-08-11 05:18:44,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=935390.0, ans=0.05 2024-08-11 05:18:56,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=935490.0, ans=0.0 2024-08-11 05:18:57,521 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 05:18:58,962 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 05:19:11,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6600, loss[loss=0.1094, beats_loss=0.01052, ecapa_loss=0.0001882, whisper_loss=0.097, over 21073.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01142, ecapa_loss=0.0002086, whisper_loss=0.0954, over 3954056.26 frames. ], batch size: 81, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:19:30,623 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 05:19:36,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=935690.0, ans=0.0 2024-08-11 05:19:44,035 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 05:19:54,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=935890.0, ans=0.125 2024-08-11 05:19:54,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=935890.0, ans=10.0 2024-08-11 05:19:57,722 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 05:20:07,962 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 05:20:11,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=935990.0, ans=0.2 2024-08-11 05:20:11,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.766e+01 3.102e+01 3.582e+01 5.637e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-11 05:20:25,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6650, loss[loss=0.0998, beats_loss=0.0127, ecapa_loss=0.0001853, whisper_loss=0.08525, over 17012.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0114, ecapa_loss=0.0002095, whisper_loss=0.09535, over 3972724.73 frames. ], batch size: 69, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:20:41,117 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 05:20:42,407 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 05:20:47,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=12.0 2024-08-11 05:20:50,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936190.0, ans=0.1 2024-08-11 05:20:55,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=936290.0, ans=22.5 2024-08-11 05:20:59,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=936290.0, ans=0.05 2024-08-11 05:21:02,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=936290.0, ans=0.125 2024-08-11 05:21:17,334 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-11 05:21:27,205 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 05:21:31,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-08-11 05:21:40,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6700, loss[loss=0.1224, beats_loss=0.009587, ecapa_loss=0.0002943, whisper_loss=0.1098, over 21293.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01129, ecapa_loss=0.0002105, whisper_loss=0.09556, over 3975785.29 frames. ], batch size: 90, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:21:46,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=936590.0, ans=0.0 2024-08-11 05:22:02,054 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-11 05:22:05,072 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 05:22:13,309 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 05:22:14,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=936790.0, ans=0.125 2024-08-11 05:22:19,278 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 05:22:20,497 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 05:22:25,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=936890.0, ans=0.0 2024-08-11 05:22:36,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=936990.0, ans=0.2 2024-08-11 05:22:39,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.752e+01 3.187e+01 3.868e+01 6.125e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-11 05:22:52,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6750, loss[loss=0.1089, beats_loss=0.008091, ecapa_loss=0.00023, whisper_loss=0.09853, over 23233.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0113, ecapa_loss=0.0002099, whisper_loss=0.09514, over 3919538.64 frames. ], batch size: 87, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:23:09,004 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 05:23:15,781 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 05:23:19,231 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 05:23:23,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=937290.0, ans=0.05 2024-08-11 05:23:24,725 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 05:23:48,718 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.703e-02 2024-08-11 05:24:05,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937590.0, ans=0.1 2024-08-11 05:24:06,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6800, loss[loss=0.09203, beats_loss=0.01227, ecapa_loss=0.0001692, whisper_loss=0.07806, over 14452.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0112, ecapa_loss=0.0002108, whisper_loss=0.095, over 3885038.47 frames. ], batch size: 55, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:24:08,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=937590.0, ans=0.125 2024-08-11 05:24:14,056 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.548e-03 2024-08-11 05:24:34,070 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-11 05:24:44,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=937790.0, ans=0.125 2024-08-11 05:24:49,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937890.0, ans=0.1 2024-08-11 05:24:56,017 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 05:25:05,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.742e+01 3.088e+01 3.392e+01 5.512e+01, threshold=6.176e+01, percent-clipped=0.0 2024-08-11 05:25:10,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=937990.0, ans=0.2 2024-08-11 05:25:18,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-08-11 05:25:18,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6850, loss[loss=0.1246, beats_loss=0.01029, ecapa_loss=0.0001949, whisper_loss=0.1124, over 23072.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01123, ecapa_loss=0.0002104, whisper_loss=0.09487, over 3891385.00 frames. ], batch size: 93, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:25:35,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2024-08-11 05:25:55,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=938290.0, ans=0.125 2024-08-11 05:26:13,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=938390.0, ans=0.0 2024-08-11 05:26:23,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=938490.0, ans=0.2 2024-08-11 05:26:33,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6900, loss[loss=0.09266, beats_loss=0.008639, ecapa_loss=0.0002362, whisper_loss=0.08166, over 17168.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01135, ecapa_loss=0.0002101, whisper_loss=0.09447, over 3890472.70 frames. ], batch size: 69, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:26:41,385 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 05:26:46,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=938590.0, ans=0.2 2024-08-11 05:27:14,164 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 05:27:22,504 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 05:27:24,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=938890.0, ans=0.125 2024-08-11 05:27:29,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=938890.0, ans=0.0 2024-08-11 05:27:34,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.641e+01 3.049e+01 3.440e+01 6.351e+01, threshold=6.099e+01, percent-clipped=1.0 2024-08-11 05:27:36,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.56 vs. limit=10.0 2024-08-11 05:27:38,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=938990.0, ans=0.0 2024-08-11 05:27:48,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 6950, loss[loss=0.1283, beats_loss=0.008835, ecapa_loss=0.0002518, whisper_loss=0.1169, over 19428.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0114, ecapa_loss=0.0002091, whisper_loss=0.09457, over 3884403.68 frames. ], batch size: 78, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:27:49,825 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 05:27:52,160 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 05:28:11,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=939190.0, ans=0.125 2024-08-11 05:28:17,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=939290.0, ans=0.0 2024-08-11 05:28:21,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=939290.0, ans=0.1 2024-08-11 05:28:25,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2024-08-11 05:28:28,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=939290.0, ans=0.125 2024-08-11 05:28:43,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=939390.0, ans=0.1 2024-08-11 05:28:45,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.72 vs. limit=22.5 2024-08-11 05:29:01,202 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7000, loss[loss=0.1133, beats_loss=0.01061, ecapa_loss=0.0001953, whisper_loss=0.1007, over 17971.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01146, ecapa_loss=0.0002074, whisper_loss=0.09426, over 3875188.28 frames. ], batch size: 71, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:29:01,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=939590.0, ans=0.125 2024-08-11 05:29:11,743 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 05:29:22,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-11 05:29:23,267 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 05:29:25,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=939690.0, ans=0.125 2024-08-11 05:29:42,600 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-11 05:29:45,105 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-11 05:29:46,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2024-08-11 05:29:50,324 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 05:29:59,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.694e+01 2.915e+01 3.195e+01 8.375e+01, threshold=5.830e+01, percent-clipped=1.0 2024-08-11 05:30:05,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=939990.0, ans=0.125 2024-08-11 05:30:11,978 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7050, loss[loss=0.0936, beats_loss=0.01133, ecapa_loss=0.00026, whisper_loss=0.07966, over 19169.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002101, whisper_loss=0.09389, over 3860671.71 frames. ], batch size: 84, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:30:19,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2024-08-11 05:30:25,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=940190.0, ans=0.125 2024-08-11 05:30:37,722 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 05:30:41,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=12.0 2024-08-11 05:30:55,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=940390.0, ans=0.125 2024-08-11 05:31:07,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=940490.0, ans=0.125 2024-08-11 05:31:18,591 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 05:31:22,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7100, loss[loss=0.0819, beats_loss=0.01433, ecapa_loss=0.0001833, whisper_loss=0.06574, over 14571.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01144, ecapa_loss=0.0002093, whisper_loss=0.09384, over 3882807.05 frames. ], batch size: 58, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:31:24,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-11 05:31:46,737 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 05:31:49,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=940790.0, ans=0.125 2024-08-11 05:31:57,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-08-11 05:32:02,331 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 05:32:04,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=940890.0, ans=0.0 2024-08-11 05:32:19,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=940990.0, ans=0.2 2024-08-11 05:32:20,670 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.694e+01 2.982e+01 3.283e+01 5.309e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 05:32:33,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7150, loss[loss=0.1172, beats_loss=0.0125, ecapa_loss=0.0001896, whisper_loss=0.1028, over 22938.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01132, ecapa_loss=0.0002093, whisper_loss=0.09474, over 3881753.35 frames. ], batch size: 91, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:32:39,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=941090.0, ans=0.125 2024-08-11 05:32:40,739 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 05:32:54,599 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 05:33:31,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-11 05:33:37,211 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 05:33:45,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=941490.0, ans=0.1 2024-08-11 05:33:50,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7200, loss[loss=0.1202, beats_loss=0.01061, ecapa_loss=0.0001902, whisper_loss=0.1077, over 20162.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01128, ecapa_loss=0.0002111, whisper_loss=0.09458, over 3881340.21 frames. ], batch size: 81, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:33:52,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=941590.0, ans=0.5 2024-08-11 05:33:53,537 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 05:34:26,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=941790.0, ans=0.125 2024-08-11 05:34:30,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=941790.0, ans=0.125 2024-08-11 05:34:32,157 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 05:34:53,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.691e+01 3.038e+01 3.510e+01 5.388e+01, threshold=6.075e+01, percent-clipped=0.0 2024-08-11 05:34:53,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=941990.0, ans=0.125 2024-08-11 05:35:00,677 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 05:35:06,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7250, loss[loss=0.119, beats_loss=0.01121, ecapa_loss=0.0001998, whisper_loss=0.1058, over 22692.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01124, ecapa_loss=0.0002108, whisper_loss=0.09425, over 3880279.93 frames. ], batch size: 91, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:35:14,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=12.0 2024-08-11 05:35:21,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=12.0 2024-08-11 05:35:26,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-08-11 05:35:31,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=942190.0, ans=0.0 2024-08-11 05:35:39,823 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 05:35:46,298 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 05:35:56,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=942390.0, ans=0.0 2024-08-11 05:36:07,267 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-11 05:36:15,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=942490.0, ans=0.04949747468305833 2024-08-11 05:36:15,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=942490.0, ans=0.125 2024-08-11 05:36:17,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=12.0 2024-08-11 05:36:19,818 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 05:36:24,024 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7300, loss[loss=0.07738, beats_loss=0.01354, ecapa_loss=0.0001943, whisper_loss=0.0619, over 17440.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01132, ecapa_loss=0.0002098, whisper_loss=0.09441, over 3903102.63 frames. ], batch size: 71, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:36:34,880 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-11 05:36:38,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=942590.0, ans=0.125 2024-08-11 05:36:43,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.21 vs. limit=10.0 2024-08-11 05:36:51,120 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 05:36:53,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-08-11 05:37:02,190 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 05:37:02,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=942790.0, ans=0.2 2024-08-11 05:37:06,446 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 05:37:19,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=942890.0, ans=0.125 2024-08-11 05:37:28,120 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.619e+01 2.865e+01 3.274e+01 5.323e+01, threshold=5.731e+01, percent-clipped=0.0 2024-08-11 05:37:33,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=942990.0, ans=0.125 2024-08-11 05:37:42,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7350, loss[loss=0.1232, beats_loss=0.009681, ecapa_loss=0.0002205, whisper_loss=0.1113, over 21279.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01147, ecapa_loss=0.000208, whisper_loss=0.09377, over 3905968.00 frames. ], batch size: 89, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:37:59,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2024-08-11 05:38:00,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=943190.0, ans=0.125 2024-08-11 05:38:12,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=943190.0, ans=0.1 2024-08-11 05:38:38,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=943390.0, ans=0.5 2024-08-11 05:38:55,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=943490.0, ans=0.125 2024-08-11 05:38:59,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=943490.0, ans=10.0 2024-08-11 05:39:02,926 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 05:39:04,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7400, loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0002125, whisper_loss=0.08847, over 16702.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01147, ecapa_loss=0.0002084, whisper_loss=0.09385, over 3891571.09 frames. ], batch size: 67, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:39:54,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=943890.0, ans=0.1 2024-08-11 05:39:57,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=943890.0, ans=0.2 2024-08-11 05:40:04,105 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 05:40:04,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=943890.0, ans=0.0 2024-08-11 05:40:12,237 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.754e+01 3.134e+01 3.578e+01 6.308e+01, threshold=6.268e+01, percent-clipped=2.0 2024-08-11 05:40:24,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=943990.0, ans=0.0 2024-08-11 05:40:27,714 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7450, loss[loss=0.1167, beats_loss=0.009414, ecapa_loss=0.0002111, whisper_loss=0.1051, over 18647.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01144, ecapa_loss=0.0002095, whisper_loss=0.09432, over 3903943.89 frames. ], batch size: 74, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:40:49,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-11 05:40:59,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=944290.0, ans=0.125 2024-08-11 05:41:01,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=944290.0, ans=0.2 2024-08-11 05:41:03,787 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 05:41:12,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=944290.0, ans=0.125 2024-08-11 05:41:21,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-11 05:41:50,915 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7500, loss[loss=0.1127, beats_loss=0.008448, ecapa_loss=0.0002521, whisper_loss=0.1017, over 16067.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01145, ecapa_loss=0.0002088, whisper_loss=0.09388, over 3879017.43 frames. ], batch size: 64, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:41:52,558 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 05:41:57,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=944590.0, ans=0.125 2024-08-11 05:41:59,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=944590.0, ans=0.2 2024-08-11 05:42:00,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=944590.0, ans=0.0 2024-08-11 05:42:05,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-11 05:42:25,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=944790.0, ans=0.125 2024-08-11 05:42:37,755 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 05:42:38,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=944890.0, ans=0.0 2024-08-11 05:42:54,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.583e+01 2.883e+01 3.295e+01 6.050e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 05:43:04,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=944990.0, ans=0.0 2024-08-11 05:43:05,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=22.5 2024-08-11 05:43:08,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7550, loss[loss=0.1132, beats_loss=0.01036, ecapa_loss=0.0002019, whisper_loss=0.1009, over 22118.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01146, ecapa_loss=0.0002078, whisper_loss=0.09339, over 3838803.24 frames. ], batch size: 88, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:43:10,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=945090.0, ans=0.125 2024-08-11 05:43:15,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=945090.0, ans=0.125 2024-08-11 05:43:43,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=945290.0, ans=0.125 2024-08-11 05:43:53,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=945390.0, ans=0.2 2024-08-11 05:44:14,796 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-11 05:44:15,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=945490.0, ans=0.125 2024-08-11 05:44:19,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=945490.0, ans=0.125 2024-08-11 05:44:21,401 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.563e-03 2024-08-11 05:44:25,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7600, loss[loss=0.1175, beats_loss=0.01372, ecapa_loss=0.0002376, whisper_loss=0.1014, over 22476.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01143, ecapa_loss=0.0002079, whisper_loss=0.09401, over 3846041.64 frames. ], batch size: 91, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:44:26,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-11 05:44:26,832 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-11 05:44:29,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-11 05:44:31,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-11 05:44:34,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=945590.0, ans=0.1 2024-08-11 05:44:53,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=945790.0, ans=0.125 2024-08-11 05:44:53,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=945790.0, ans=0.0 2024-08-11 05:44:54,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=945790.0, ans=10.0 2024-08-11 05:45:07,425 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 05:45:17,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=945890.0, ans=0.09899494936611666 2024-08-11 05:45:20,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=945890.0, ans=0.2 2024-08-11 05:45:25,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=945990.0, ans=0.125 2024-08-11 05:45:27,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.611e+01 2.976e+01 3.513e+01 5.739e+01, threshold=5.952e+01, percent-clipped=0.0 2024-08-11 05:45:37,542 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 05:45:39,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=945990.0, ans=0.125 2024-08-11 05:45:41,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7650, loss[loss=0.1013, beats_loss=0.01212, ecapa_loss=0.0001934, whisper_loss=0.08728, over 22627.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01142, ecapa_loss=0.0002087, whisper_loss=0.09363, over 3849398.72 frames. ], batch size: 89, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:45:58,015 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.069e+03 2024-08-11 05:45:59,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=946190.0, ans=0.0 2024-08-11 05:46:05,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=946190.0, ans=0.1 2024-08-11 05:46:17,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=946290.0, ans=0.0 2024-08-11 05:46:32,831 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 05:46:58,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7700, loss[loss=0.1251, beats_loss=0.009493, ecapa_loss=0.0002202, whisper_loss=0.1134, over 16823.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01146, ecapa_loss=0.0002081, whisper_loss=0.09314, over 3837467.15 frames. ], batch size: 65, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:47:15,133 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 05:47:21,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=946690.0, ans=0.2 2024-08-11 05:47:22,987 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 05:47:36,762 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 05:47:41,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=946790.0, ans=0.0 2024-08-11 05:47:47,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=946890.0, ans=0.2 2024-08-11 05:48:03,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.753e+01 2.991e+01 3.515e+01 5.898e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 05:48:04,271 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 05:48:12,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.78 vs. limit=12.0 2024-08-11 05:48:17,940 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7750, loss[loss=0.1059, beats_loss=0.01181, ecapa_loss=0.0002466, whisper_loss=0.09162, over 21359.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01143, ecapa_loss=0.0002088, whisper_loss=0.09344, over 3886162.87 frames. ], batch size: 91, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:48:30,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=947090.0, ans=0.125 2024-08-11 05:48:34,792 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 05:48:50,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=947290.0, ans=0.125 2024-08-11 05:48:57,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=947290.0, ans=0.0 2024-08-11 05:49:11,765 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 05:49:13,104 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 05:49:14,722 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 05:49:25,595 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 05:49:27,932 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 05:49:30,749 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 05:49:36,165 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7800, loss[loss=0.09578, beats_loss=0.01154, ecapa_loss=0.0001967, whisper_loss=0.08227, over 19225.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0002078, whisper_loss=0.09327, over 3850762.52 frames. ], batch size: 77, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:49:53,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-11 05:49:56,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.65 vs. limit=10.0 2024-08-11 05:50:06,068 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 12 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 05:50:09,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-11 05:50:13,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=947790.0, ans=0.1 2024-08-11 05:50:15,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=947790.0, ans=0.125 2024-08-11 05:50:16,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=947790.0, ans=0.5 2024-08-11 05:50:38,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=947990.0, ans=0.125 2024-08-11 05:50:39,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.753e+01 3.128e+01 3.537e+01 5.360e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 05:50:44,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=947990.0, ans=10.0 2024-08-11 05:50:53,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7850, loss[loss=0.09515, beats_loss=0.01293, ecapa_loss=0.0001486, whisper_loss=0.08073, over 15674.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01144, ecapa_loss=0.0002074, whisper_loss=0.09416, over 3856219.89 frames. ], batch size: 60, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:50:57,289 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-11 05:51:00,018 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-11 05:51:42,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=948390.0, ans=0.2 2024-08-11 05:51:46,379 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 05:52:09,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7900, loss[loss=0.1133, beats_loss=0.0122, ecapa_loss=0.0001904, whisper_loss=0.09918, over 22476.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01144, ecapa_loss=0.0002063, whisper_loss=0.09513, over 3911027.61 frames. ], batch size: 88, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:52:21,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=948590.0, ans=0.125 2024-08-11 05:52:53,488 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 05:53:01,029 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 05:53:14,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.621e+01 3.000e+01 3.506e+01 5.251e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-11 05:53:15,253 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 05:53:23,988 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 05:53:24,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=948990.0, ans=0.0 2024-08-11 05:53:29,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 7950, loss[loss=0.1222, beats_loss=0.01108, ecapa_loss=0.0001777, whisper_loss=0.1093, over 20877.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01146, ecapa_loss=0.0002052, whisper_loss=0.09514, over 3911290.60 frames. ], batch size: 80, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:53:42,058 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 05:53:43,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=949190.0, ans=0.2 2024-08-11 05:53:45,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=949190.0, ans=0.125 2024-08-11 05:53:52,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=949190.0, ans=0.0 2024-08-11 05:53:55,747 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.580e-01 2024-08-11 05:54:01,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=949290.0, ans=0.0 2024-08-11 05:54:27,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=12.0 2024-08-11 05:54:36,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=15.0 2024-08-11 05:54:36,540 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 05:54:42,857 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 05:54:50,024 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8000, loss[loss=0.1087, beats_loss=0.01241, ecapa_loss=0.0002139, whisper_loss=0.09416, over 21985.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0114, ecapa_loss=0.000206, whisper_loss=0.09509, over 3880060.02 frames. ], batch size: 92, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:54:50,288 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 05:55:00,909 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 05:55:05,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=949690.0, ans=0.0 2024-08-11 05:55:10,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-11 05:55:21,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=949790.0, ans=0.2 2024-08-11 05:55:26,455 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 05:55:57,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2024-08-11 05:55:58,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.705e+01 3.037e+01 3.592e+01 7.289e+01, threshold=6.074e+01, percent-clipped=2.0 2024-08-11 05:56:04,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949990.0, ans=0.1 2024-08-11 05:56:10,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8050, loss[loss=0.1034, beats_loss=0.0159, ecapa_loss=0.0001643, whisper_loss=0.08584, over 16081.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01139, ecapa_loss=0.0002072, whisper_loss=0.09491, over 3890629.34 frames. ], batch size: 67, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:56:13,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-11 05:56:38,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2024-08-11 05:56:44,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2024-08-11 05:57:02,028 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 05:57:03,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=950390.0, ans=0.025 2024-08-11 05:57:26,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=950490.0, ans=0.1 2024-08-11 05:57:27,556 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:57:28,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8100, loss[loss=0.1019, beats_loss=0.01118, ecapa_loss=0.0002509, whisper_loss=0.08816, over 21098.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01135, ecapa_loss=0.0002071, whisper_loss=0.09513, over 3885055.66 frames. ], batch size: 90, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:57:28,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=950590.0, ans=0.125 2024-08-11 05:57:30,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=950590.0, ans=0.125 2024-08-11 05:57:43,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=950690.0, ans=0.0 2024-08-11 05:57:59,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=15.0 2024-08-11 05:58:00,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=15.0 2024-08-11 05:58:19,835 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 05:58:35,873 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 05:58:36,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.727e+01 3.067e+01 3.354e+01 4.801e+01, threshold=6.134e+01, percent-clipped=0.0 2024-08-11 05:58:50,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=951090.0, ans=0.0 2024-08-11 05:58:51,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8150, loss[loss=0.1196, beats_loss=0.01142, ecapa_loss=0.0001605, whisper_loss=0.1066, over 19279.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0113, ecapa_loss=0.0002078, whisper_loss=0.09497, over 3864999.61 frames. ], batch size: 72, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:59:05,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=951090.0, ans=0.02 2024-08-11 05:59:31,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=951290.0, ans=0.125 2024-08-11 05:59:41,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=951390.0, ans=0.125 2024-08-11 06:00:00,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2024-08-11 06:00:03,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=951490.0, ans=0.04949747468305833 2024-08-11 06:00:12,905 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 06:00:13,754 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8200, loss[loss=0.1132, beats_loss=0.01285, ecapa_loss=0.0002047, whisper_loss=0.09831, over 22741.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01128, ecapa_loss=0.0002067, whisper_loss=0.09565, over 3896025.93 frames. ], batch size: 95, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:00:15,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=951590.0, ans=0.2 2024-08-11 06:00:16,363 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 06:00:26,863 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 06:00:27,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=12.0 2024-08-11 06:00:30,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=951690.0, ans=0.07 2024-08-11 06:00:39,281 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 06:00:48,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=951790.0, ans=0.125 2024-08-11 06:01:19,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=951990.0, ans=0.1 2024-08-11 06:01:19,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=951990.0, ans=0.125 2024-08-11 06:01:19,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.662e+01 3.047e+01 3.528e+01 2.595e+02, threshold=6.093e+01, percent-clipped=1.0 2024-08-11 06:01:34,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8250, loss[loss=0.1096, beats_loss=0.01118, ecapa_loss=0.0001862, whisper_loss=0.0966, over 22235.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01133, ecapa_loss=0.0002069, whisper_loss=0.09546, over 3902190.96 frames. ], batch size: 87, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:01:37,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=952090.0, ans=0.125 2024-08-11 06:01:45,215 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 06:01:49,722 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 06:02:17,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2024-08-11 06:02:20,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=952290.0, ans=0.125 2024-08-11 06:02:43,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=952490.0, ans=0.0 2024-08-11 06:02:48,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-08-11 06:02:54,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8300, loss[loss=0.09308, beats_loss=0.01402, ecapa_loss=0.0001869, whisper_loss=0.07719, over 20606.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01148, ecapa_loss=0.0002055, whisper_loss=0.09362, over 3857567.58 frames. ], batch size: 85, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:03:10,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-11 06:03:10,838 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-11 06:03:11,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=952690.0, ans=0.125 2024-08-11 06:03:46,098 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-11 06:03:46,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=952890.0, ans=0.1 2024-08-11 06:03:50,810 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 06:03:52,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2024-08-11 06:03:58,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.727e+01 2.981e+01 3.576e+01 6.756e+01, threshold=5.962e+01, percent-clipped=1.0 2024-08-11 06:04:00,692 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-11 06:04:01,827 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 06:04:05,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2024-08-11 06:04:12,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8350, loss[loss=0.1046, beats_loss=0.01112, ecapa_loss=0.0002362, whisper_loss=0.09115, over 18034.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01143, ecapa_loss=0.0002084, whisper_loss=0.09424, over 3890326.54 frames. ], batch size: 77, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:04:16,675 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 06:04:19,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=953090.0, ans=0.125 2024-08-11 06:04:22,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=953090.0, ans=0.0 2024-08-11 06:04:24,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-11 06:04:28,490 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 06:04:29,871 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 06:04:33,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=953190.0, ans=0.1 2024-08-11 06:04:35,622 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 06:05:33,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8400, loss[loss=0.1249, beats_loss=0.008057, ecapa_loss=0.0002191, whisper_loss=0.1147, over 20564.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01138, ecapa_loss=0.000208, whisper_loss=0.09507, over 3921938.79 frames. ], batch size: 79, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:05:35,245 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 22 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-11 06:05:38,998 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 06:05:57,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=953690.0, ans=0.0 2024-08-11 06:06:21,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=953790.0, ans=0.125 2024-08-11 06:06:40,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.819e+01 3.267e+01 3.747e+01 3.320e+02, threshold=6.533e+01, percent-clipped=4.0 2024-08-11 06:06:41,030 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 06:06:54,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8450, loss[loss=0.1129, beats_loss=0.01166, ecapa_loss=0.0001895, whisper_loss=0.09933, over 22656.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01141, ecapa_loss=0.000207, whisper_loss=0.09417, over 3917481.72 frames. ], batch size: 87, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:07:00,860 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 06:07:14,559 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 06:07:33,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.83 vs. limit=10.0 2024-08-11 06:07:35,922 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 06:08:01,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=954490.0, ans=0.1 2024-08-11 06:08:15,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=954490.0, ans=0.0 2024-08-11 06:08:17,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8500, loss[loss=0.1177, beats_loss=0.009281, ecapa_loss=0.0002329, whisper_loss=0.106, over 14732.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002077, whisper_loss=0.09397, over 3907948.17 frames. ], batch size: 59, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:08:18,609 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 06:08:28,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=954590.0, ans=0.2 2024-08-11 06:08:29,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=954590.0, ans=0.0 2024-08-11 06:08:31,321 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 06:08:37,874 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 06:08:38,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=954690.0, ans=0.125 2024-08-11 06:08:43,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=954690.0, ans=0.125 2024-08-11 06:08:56,146 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 06:09:23,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-11 06:09:25,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.679e+01 3.057e+01 3.369e+01 5.558e+01, threshold=6.114e+01, percent-clipped=0.0 2024-08-11 06:09:39,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8550, loss[loss=0.09727, beats_loss=0.01019, ecapa_loss=0.0002435, whisper_loss=0.08465, over 13895.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01136, ecapa_loss=0.0002081, whisper_loss=0.0944, over 3894521.58 frames. ], batch size: 55, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:10:04,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=955190.0, ans=0.125 2024-08-11 06:10:09,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955190.0, ans=0.1 2024-08-11 06:10:13,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.20 vs. limit=15.0 2024-08-11 06:10:23,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=955290.0, ans=0.125 2024-08-11 06:10:32,989 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 06:10:43,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2024-08-11 06:10:44,346 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 06:10:48,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=955490.0, ans=0.02 2024-08-11 06:11:05,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8600, loss[loss=0.09088, beats_loss=0.009724, ecapa_loss=0.0002513, whisper_loss=0.07864, over 15845.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01129, ecapa_loss=0.0002066, whisper_loss=0.09515, over 3905106.77 frames. ], batch size: 64, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:11:07,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=955590.0, ans=0.125 2024-08-11 06:11:53,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-11 06:11:55,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=955890.0, ans=0.0 2024-08-11 06:12:11,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=955990.0, ans=0.125 2024-08-11 06:12:14,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.782e+01 3.171e+01 3.818e+01 6.085e+01, threshold=6.342e+01, percent-clipped=0.0 2024-08-11 06:12:28,120 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 06:12:28,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8650, loss[loss=0.1001, beats_loss=0.0119, ecapa_loss=0.0002447, whisper_loss=0.08578, over 22000.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01141, ecapa_loss=0.0002058, whisper_loss=0.09416, over 3883899.96 frames. ], batch size: 91, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:12:29,770 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 29 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-11 06:12:43,468 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 06:12:56,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=956190.0, ans=0.125 2024-08-11 06:13:16,307 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.615e-02 2024-08-11 06:13:24,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=956390.0, ans=0.2 2024-08-11 06:13:28,648 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 06:13:33,661 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 06:13:39,742 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 06:13:52,058 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8700, loss[loss=0.1329, beats_loss=0.01025, ecapa_loss=0.0001962, whisper_loss=0.1207, over 23481.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01133, ecapa_loss=0.0002074, whisper_loss=0.09493, over 3898942.70 frames. ], batch size: 93, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:14:06,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=956690.0, ans=0.125 2024-08-11 06:14:14,288 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.754e+05 2024-08-11 06:14:15,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=956690.0, ans=0.2 2024-08-11 06:14:31,555 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 06:14:44,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=956890.0, ans=0.125 2024-08-11 06:14:57,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.743e+01 3.051e+01 3.561e+01 4.836e+01, threshold=6.102e+01, percent-clipped=0.0 2024-08-11 06:15:11,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8750, loss[loss=0.08246, beats_loss=0.01362, ecapa_loss=0.0002173, whisper_loss=0.06668, over 20274.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0114, ecapa_loss=0.0002061, whisper_loss=0.09512, over 3923663.85 frames. ], batch size: 83, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:15:15,362 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 06:15:25,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2024-08-11 06:15:33,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=957190.0, ans=0.1 2024-08-11 06:16:28,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=957590.0, ans=0.2 2024-08-11 06:16:29,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8800, loss[loss=0.1249, beats_loss=0.00992, ecapa_loss=0.0001721, whisper_loss=0.1132, over 22766.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01137, ecapa_loss=0.0002072, whisper_loss=0.09489, over 3912332.12 frames. ], batch size: 87, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:16:36,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=957590.0, ans=0.0 2024-08-11 06:16:40,129 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-11 06:17:08,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=957790.0, ans=0.1 2024-08-11 06:17:16,615 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 06:17:27,590 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 06:17:33,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.553e+01 2.761e+01 3.256e+01 4.911e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-11 06:17:49,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8850, loss[loss=0.09375, beats_loss=0.01075, ecapa_loss=0.0002083, whisper_loss=0.08091, over 16706.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01145, ecapa_loss=0.000206, whisper_loss=0.09424, over 3907628.86 frames. ], batch size: 65, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:18:12,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2024-08-11 06:18:34,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=958290.0, ans=0.0 2024-08-11 06:18:43,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=958390.0, ans=0.125 2024-08-11 06:19:10,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8900, loss[loss=0.09662, beats_loss=0.01448, ecapa_loss=0.0001926, whisper_loss=0.08021, over 18502.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01149, ecapa_loss=0.0002048, whisper_loss=0.09389, over 3893163.88 frames. ], batch size: 75, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:19:18,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=958590.0, ans=0.0 2024-08-11 06:19:36,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=958690.0, ans=0.125 2024-08-11 06:19:47,973 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 06:19:48,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=958790.0, ans=0.0 2024-08-11 06:19:58,828 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 06:20:02,171 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 06:20:12,066 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 06:20:13,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.703e+01 3.133e+01 3.628e+01 5.499e+01, threshold=6.267e+01, percent-clipped=0.0 2024-08-11 06:20:26,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 8950, loss[loss=0.1234, beats_loss=0.01009, ecapa_loss=0.000188, whisper_loss=0.1114, over 22639.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01144, ecapa_loss=0.0002046, whisper_loss=0.09349, over 3872912.85 frames. ], batch size: 87, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:20:30,207 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 06:20:46,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=959190.0, ans=0.125 2024-08-11 06:20:49,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=959190.0, ans=0.125 2024-08-11 06:21:12,347 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 06:21:16,885 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-11 06:21:21,509 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-11 06:21:21,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=959390.0, ans=0.2 2024-08-11 06:21:24,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=959390.0, ans=0.2 2024-08-11 06:21:30,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=959490.0, ans=0.0 2024-08-11 06:21:35,307 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 06:21:40,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9000, loss[loss=0.08228, beats_loss=0.01393, ecapa_loss=0.0002439, whisper_loss=0.06591, over 19699.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.0002055, whisper_loss=0.09335, over 3890810.22 frames. ], batch size: 88, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:21:40,488 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 06:22:22,427 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on ASR_libri: loss=0.2572, beats_loss=0, ecapa_loss=0.0006695, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 06:22:31,556 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1648, 2.8178, 2.7419, 2.2663], device='cuda:3') 2024-08-11 06:22:33,001 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6951, 4.1951, 4.3361, 4.5797], device='cuda:3') 2024-08-11 06:22:40,926 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on SV_voxceleb1: loss=0.005671, beats_loss=0, ecapa_loss=0.0005671, whisper_loss=0, over 939242.00 frames. 2024-08-11 06:24:43,643 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 06:24:43,653 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 06:24:45,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=959590.0, ans=0.125 2024-08-11 06:24:45,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=959590.0, ans=0.125 2024-08-11 06:24:46,272 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 11 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 06:24:49,114 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 06:25:27,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=959890.0, ans=0.0 2024-08-11 06:25:27,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2024-08-11 06:25:33,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=959890.0, ans=0.1 2024-08-11 06:25:37,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=959890.0, ans=0.1 2024-08-11 06:25:41,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=959890.0, ans=0.1 2024-08-11 06:25:49,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.675e+01 2.932e+01 3.308e+01 5.321e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 06:26:03,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9050, loss[loss=0.1263, beats_loss=0.01004, ecapa_loss=0.0002073, whisper_loss=0.1142, over 20743.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01143, ecapa_loss=0.0002059, whisper_loss=0.09381, over 3877700.40 frames. ], batch size: 81, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:26:33,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=960190.0, ans=0.0 2024-08-11 06:26:35,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=960190.0, ans=0.125 2024-08-11 06:26:46,758 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 06:26:57,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=960290.0, ans=0.125 2024-08-11 06:27:04,607 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 06:27:04,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=960390.0, ans=0.125 2024-08-11 06:27:08,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2024-08-11 06:27:13,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=960390.0, ans=0.125 2024-08-11 06:27:17,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-11 06:27:31,546 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 06:27:32,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9100, loss[loss=0.1012, beats_loss=0.01207, ecapa_loss=0.0002179, whisper_loss=0.08696, over 21410.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01144, ecapa_loss=0.0002056, whisper_loss=0.09391, over 3897030.52 frames. ], batch size: 88, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:27:52,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=960690.0, ans=0.2 2024-08-11 06:28:04,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-08-11 06:28:14,366 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 06:28:31,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2024-08-11 06:28:52,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.825e+01 3.107e+01 3.810e+01 5.498e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 06:29:02,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=960990.0, ans=0.0 2024-08-11 06:29:03,713 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 06:29:10,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9150, loss[loss=0.09033, beats_loss=0.01482, ecapa_loss=0.0001667, whisper_loss=0.07384, over 15553.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002068, whisper_loss=0.09394, over 3895592.35 frames. ], batch size: 62, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:29:33,240 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 06:29:52,559 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 06:29:54,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=961290.0, ans=0.125 2024-08-11 06:30:09,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=961390.0, ans=0.125 2024-08-11 06:30:15,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=961390.0, ans=0.125 2024-08-11 06:30:21,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=961390.0, ans=0.0 2024-08-11 06:30:43,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9200, loss[loss=0.1053, beats_loss=0.0094, ecapa_loss=0.0002123, whisper_loss=0.09375, over 16681.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01146, ecapa_loss=0.0002075, whisper_loss=0.09306, over 3873023.90 frames. ], batch size: 67, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:31:04,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=961690.0, ans=0.125 2024-08-11 06:31:09,590 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 06:32:06,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.686e+01 3.168e+01 3.590e+01 6.490e+01, threshold=6.336e+01, percent-clipped=1.0 2024-08-11 06:32:26,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9250, loss[loss=0.1148, beats_loss=0.009404, ecapa_loss=0.0002223, whisper_loss=0.1031, over 22504.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002085, whisper_loss=0.09317, over 3902937.11 frames. ], batch size: 90, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:32:32,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=962090.0, ans=0.125 2024-08-11 06:32:37,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=962090.0, ans=0.125 2024-08-11 06:33:01,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=962190.0, ans=0.125 2024-08-11 06:33:09,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-08-11 06:33:11,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=962290.0, ans=0.2 2024-08-11 06:33:13,984 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 06:33:16,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=962290.0, ans=0.1 2024-08-11 06:33:23,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=962390.0, ans=0.0 2024-08-11 06:33:23,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=22.5 2024-08-11 06:33:26,466 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 28 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 06:33:48,941 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.619e+05 2024-08-11 06:33:49,755 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9300, loss[loss=0.1045, beats_loss=0.01264, ecapa_loss=0.0001666, whisper_loss=0.09019, over 17859.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01143, ecapa_loss=0.0002085, whisper_loss=0.09332, over 3906419.07 frames. ], batch size: 65, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:33:50,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=962590.0, ans=0.0 2024-08-11 06:33:58,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=962590.0, ans=0.0 2024-08-11 06:34:01,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=962590.0, ans=0.125 2024-08-11 06:34:08,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=962690.0, ans=0.0 2024-08-11 06:34:09,502 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-11 06:34:10,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-11 06:34:26,897 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 06:34:34,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-11 06:34:50,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.791e+01 3.053e+01 3.524e+01 6.115e+01, threshold=6.107e+01, percent-clipped=0.0 2024-08-11 06:34:57,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=962990.0, ans=0.95 2024-08-11 06:35:03,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9350, loss[loss=0.1122, beats_loss=0.01149, ecapa_loss=0.0001957, whisper_loss=0.09879, over 18489.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01147, ecapa_loss=0.0002087, whisper_loss=0.09277, over 3895536.94 frames. ], batch size: 72, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:35:18,826 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 06:35:57,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=963390.0, ans=0.125 2024-08-11 06:36:08,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=963490.0, ans=0.2 2024-08-11 06:36:11,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=963490.0, ans=0.5 2024-08-11 06:36:15,455 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 06:36:17,966 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9400, loss[loss=0.1187, beats_loss=0.01007, ecapa_loss=0.0002244, whisper_loss=0.1064, over 22141.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01144, ecapa_loss=0.0002084, whisper_loss=0.09306, over 3908834.65 frames. ], batch size: 90, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:36:28,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=963590.0, ans=22.5 2024-08-11 06:36:38,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=963690.0, ans=0.2 2024-08-11 06:36:41,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=963690.0, ans=0.0 2024-08-11 06:36:45,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=12.0 2024-08-11 06:36:55,493 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 06:37:01,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=963890.0, ans=0.125 2024-08-11 06:37:04,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=963890.0, ans=0.2 2024-08-11 06:37:07,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2024-08-11 06:37:15,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=963890.0, ans=0.0 2024-08-11 06:37:18,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.687e+01 3.013e+01 3.513e+01 7.296e+01, threshold=6.026e+01, percent-clipped=1.0 2024-08-11 06:37:25,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=963990.0, ans=0.125 2024-08-11 06:37:30,329 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 06:37:30,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-11 06:37:32,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9450, loss[loss=0.09705, beats_loss=0.01088, ecapa_loss=0.0002312, whisper_loss=0.08385, over 14666.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01148, ecapa_loss=0.0002099, whisper_loss=0.09269, over 3884210.55 frames. ], batch size: 60, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:37:34,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=964090.0, ans=0.09899494936611666 2024-08-11 06:37:44,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=964090.0, ans=0.125 2024-08-11 06:37:44,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=964090.0, ans=0.07 2024-08-11 06:37:52,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=964190.0, ans=15.0 2024-08-11 06:38:10,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=964290.0, ans=0.125 2024-08-11 06:38:19,250 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 06:38:24,013 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 06:38:24,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=964390.0, ans=15.0 2024-08-11 06:38:27,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.40 vs. limit=5.0 2024-08-11 06:38:46,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=964490.0, ans=0.95 2024-08-11 06:38:48,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9500, loss[loss=0.08996, beats_loss=0.01204, ecapa_loss=0.000228, whisper_loss=0.07563, over 21727.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01152, ecapa_loss=0.0002104, whisper_loss=0.09209, over 3878274.19 frames. ], batch size: 94, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:39:19,548 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 06:39:29,558 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 06:39:41,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2024-08-11 06:39:44,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=964890.0, ans=0.2 2024-08-11 06:39:50,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2024-08-11 06:39:50,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.745e+01 3.159e+01 3.801e+01 1.108e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 06:39:56,779 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 16 from Vox, 55 fro AS 2024-08-11 06:40:00,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=964990.0, ans=0.125 2024-08-11 06:40:03,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9550, loss[loss=0.1208, beats_loss=0.01151, ecapa_loss=0.0001898, whisper_loss=0.1074, over 19081.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0115, ecapa_loss=0.0002097, whisper_loss=0.09202, over 3892568.24 frames. ], batch size: 74, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:40:16,774 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 06:40:18,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=965190.0, ans=0.125 2024-08-11 06:40:26,531 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 06:40:35,187 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 06:40:37,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965290.0, ans=0.1 2024-08-11 06:40:38,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=965290.0, ans=0.125 2024-08-11 06:40:45,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=965390.0, ans=0.0 2024-08-11 06:40:48,399 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 06:40:51,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=965390.0, ans=0.0 2024-08-11 06:40:57,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2024-08-11 06:41:09,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-08-11 06:41:11,382 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 06:41:13,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9600, loss[loss=0.09608, beats_loss=0.0113, ecapa_loss=0.0002258, whisper_loss=0.08253, over 18109.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01148, ecapa_loss=0.0002091, whisper_loss=0.09245, over 3868385.06 frames. ], batch size: 75, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:41:21,019 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 06:41:33,206 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 06:41:44,004 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 06:41:45,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=965790.0, ans=0.125 2024-08-11 06:41:53,882 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 06:42:01,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=965890.0, ans=0.2 2024-08-11 06:42:03,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=965890.0, ans=0.1 2024-08-11 06:42:06,849 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 06:42:11,615 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 06:42:14,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.765e+01 3.049e+01 3.383e+01 4.788e+01, threshold=6.099e+01, percent-clipped=0.0 2024-08-11 06:42:17,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=965990.0, ans=0.1 2024-08-11 06:42:27,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=966090.0, ans=0.125 2024-08-11 06:42:28,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9650, loss[loss=0.09547, beats_loss=0.01384, ecapa_loss=0.0001829, whisper_loss=0.0798, over 16428.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01149, ecapa_loss=0.000208, whisper_loss=0.09253, over 3839924.36 frames. ], batch size: 66, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:42:58,870 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 06:43:05,478 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 06:43:10,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=966290.0, ans=0.125 2024-08-11 06:43:10,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=966290.0, ans=0.0 2024-08-11 06:43:13,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.45 vs. limit=10.0 2024-08-11 06:43:19,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=966390.0, ans=0.0 2024-08-11 06:43:32,017 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 06:43:32,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=966490.0, ans=0.125 2024-08-11 06:43:33,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=966490.0, ans=0.125 2024-08-11 06:43:43,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9700, loss[loss=0.1279, beats_loss=0.01137, ecapa_loss=0.0001687, whisper_loss=0.1148, over 14464.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01149, ecapa_loss=0.0002083, whisper_loss=0.09306, over 3864653.16 frames. ], batch size: 55, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:43:55,245 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 06:43:55,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=966590.0, ans=0.125 2024-08-11 06:44:09,409 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 06:44:42,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.608e+01 2.892e+01 3.245e+01 5.119e+01, threshold=5.784e+01, percent-clipped=0.0 2024-08-11 06:44:55,499 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9750, loss[loss=0.09432, beats_loss=0.01184, ecapa_loss=0.0002567, whisper_loss=0.07991, over 22181.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01145, ecapa_loss=0.0002064, whisper_loss=0.0933, over 3864703.02 frames. ], batch size: 92, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:45:02,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-08-11 06:45:05,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=967090.0, ans=0.125 2024-08-11 06:45:07,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=967090.0, ans=0.0 2024-08-11 06:45:25,398 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 06:45:44,976 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 06:45:45,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=967390.0, ans=0.125 2024-08-11 06:45:47,389 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 06:45:48,824 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 06:46:07,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9800, loss[loss=0.102, beats_loss=0.0118, ecapa_loss=0.0001997, whisper_loss=0.08816, over 21649.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01152, ecapa_loss=0.0002059, whisper_loss=0.09279, over 3856849.61 frames. ], batch size: 89, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:46:51,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=967890.0, ans=0.125 2024-08-11 06:47:02,748 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 06:47:03,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=967890.0, ans=0.125 2024-08-11 06:47:06,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.643e+01 2.929e+01 3.455e+01 6.415e+01, threshold=5.858e+01, percent-clipped=3.0 2024-08-11 06:47:16,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-11 06:47:19,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9850, loss[loss=0.1185, beats_loss=0.009982, ecapa_loss=0.000156, whisper_loss=0.107, over 18579.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01153, ecapa_loss=0.0002044, whisper_loss=0.09307, over 3864704.82 frames. ], batch size: 68, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:47:21,782 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-11 06:48:01,481 INFO [train_multi_KD3.py:844] (3/4) A total of 98 cuts. 24 from LS+wenet, 23 from Vox, 51 fro AS 2024-08-11 06:48:01,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968290.0, ans=0.1 2024-08-11 06:48:11,722 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-11 06:48:17,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-08-11 06:48:27,158 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 06:48:34,802 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9900, loss[loss=0.1032, beats_loss=0.009793, ecapa_loss=0.0002349, whisper_loss=0.09101, over 19571.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01153, ecapa_loss=0.000204, whisper_loss=0.09317, over 3881580.33 frames. ], batch size: 82, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:48:45,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=968590.0, ans=0.125 2024-08-11 06:48:46,531 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 06:48:52,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=968690.0, ans=0.2 2024-08-11 06:48:58,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=968690.0, ans=0.125 2024-08-11 06:49:04,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=968790.0, ans=0.0 2024-08-11 06:49:14,029 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 06:49:14,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=968790.0, ans=0.0 2024-08-11 06:49:18,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=968890.0, ans=0.0 2024-08-11 06:49:25,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=968890.0, ans=0.0 2024-08-11 06:49:31,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=968990.0, ans=0.125 2024-08-11 06:49:32,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.797e+01 3.066e+01 3.610e+01 6.025e+01, threshold=6.133e+01, percent-clipped=2.0 2024-08-11 06:49:36,804 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 06:49:43,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=968990.0, ans=0.125 2024-08-11 06:49:45,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 9950, loss[loss=0.1194, beats_loss=0.01006, ecapa_loss=0.0002203, whisper_loss=0.1071, over 22964.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01161, ecapa_loss=0.0002036, whisper_loss=0.09242, over 3879208.42 frames. ], batch size: 92, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:49:52,761 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 06:49:56,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.09 vs. limit=15.0 2024-08-11 06:49:59,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=969190.0, ans=0.0 2024-08-11 06:50:03,411 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 06:50:07,589 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 06:50:25,808 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 06:50:44,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=15.0 2024-08-11 06:50:58,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10000, loss[loss=0.114, beats_loss=0.009692, ecapa_loss=0.0001995, whisper_loss=0.1024, over 22274.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01154, ecapa_loss=0.0002041, whisper_loss=0.09307, over 3898150.00 frames. ], batch size: 88, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:50:58,481 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 06:50:59,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.59 vs. limit=22.5 2024-08-11 06:51:17,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=969690.0, ans=0.125 2024-08-11 06:51:32,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.26 vs. limit=15.0 2024-08-11 06:51:37,316 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 06:51:47,063 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 06:51:56,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.627e+01 2.974e+01 3.477e+01 5.733e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-11 06:51:59,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=969990.0, ans=0.0 2024-08-11 06:52:09,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10050, loss[loss=0.1046, beats_loss=0.01212, ecapa_loss=0.0001857, whisper_loss=0.09064, over 18516.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01148, ecapa_loss=0.0002054, whisper_loss=0.0935, over 3877483.54 frames. ], batch size: 74, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:52:31,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=970190.0, ans=0.0 2024-08-11 06:52:42,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-11 06:52:47,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=970290.0, ans=0.125 2024-08-11 06:52:53,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=970390.0, ans=0.0 2024-08-11 06:53:12,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=970490.0, ans=0.125 2024-08-11 06:53:18,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10100, loss[loss=0.1039, beats_loss=0.01331, ecapa_loss=0.0001937, whisper_loss=0.08862, over 22538.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01148, ecapa_loss=0.000205, whisper_loss=0.09419, over 3907849.37 frames. ], batch size: 91, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:53:18,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=970590.0, ans=0.1 2024-08-11 06:53:19,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=970590.0, ans=0.125 2024-08-11 06:53:22,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=970590.0, ans=0.2 2024-08-11 06:53:36,912 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.611e-01 2024-08-11 06:53:44,080 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 06:53:45,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=970790.0, ans=0.0 2024-08-11 06:53:49,402 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 7 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 06:54:02,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=970890.0, ans=0.1 2024-08-11 06:54:11,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.833e+01 3.189e+01 3.704e+01 6.701e+01, threshold=6.379e+01, percent-clipped=2.0 2024-08-11 06:54:23,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10150, loss[loss=0.135, beats_loss=0.009771, ecapa_loss=0.0001915, whisper_loss=0.1233, over 21961.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01132, ecapa_loss=0.0002081, whisper_loss=0.09458, over 3887290.17 frames. ], batch size: 81, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:54:24,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=971090.0, ans=0.125 2024-08-11 06:54:25,777 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 06:54:34,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=971090.0, ans=0.0 2024-08-11 06:54:38,902 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 06:55:06,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=971390.0, ans=0.1 2024-08-11 06:55:28,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-11 06:55:28,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10200, loss[loss=0.1068, beats_loss=0.01354, ecapa_loss=0.0002044, whisper_loss=0.09127, over 21647.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01137, ecapa_loss=0.0002089, whisper_loss=0.09409, over 3886654.74 frames. ], batch size: 92, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:55:33,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-11 06:55:42,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=971690.0, ans=0.07 2024-08-11 06:55:45,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-11 06:55:53,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=971690.0, ans=0.2 2024-08-11 06:56:03,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=971790.0, ans=0.125 2024-08-11 06:56:04,231 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 06:56:07,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=971890.0, ans=0.0 2024-08-11 06:56:17,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2024-08-11 06:56:20,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2024-08-11 06:56:22,079 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.593e+01 3.063e+01 3.580e+01 1.842e+02, threshold=6.125e+01, percent-clipped=1.0 2024-08-11 06:56:31,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=971990.0, ans=0.125 2024-08-11 06:56:31,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=10.0 2024-08-11 06:56:33,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10250, loss[loss=0.09833, beats_loss=0.01319, ecapa_loss=0.0002322, whisper_loss=0.08282, over 15725.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01137, ecapa_loss=0.0002086, whisper_loss=0.09439, over 3888989.21 frames. ], batch size: 65, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:56:43,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=972090.0, ans=0.125 2024-08-11 06:56:46,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=972190.0, ans=0.125 2024-08-11 06:56:48,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=972190.0, ans=0.0 2024-08-11 06:56:52,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=972190.0, ans=0.125 2024-08-11 06:56:59,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2024-08-11 06:57:04,088 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-11 06:57:07,810 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 06:57:30,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=972490.0, ans=0.0 2024-08-11 06:57:31,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=972490.0, ans=0.05 2024-08-11 06:57:36,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=972490.0, ans=0.1 2024-08-11 06:57:38,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10300, loss[loss=0.1072, beats_loss=0.01211, ecapa_loss=0.0002196, whisper_loss=0.0929, over 21944.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01143, ecapa_loss=0.0002067, whisper_loss=0.09362, over 3902388.66 frames. ], batch size: 91, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:57:57,820 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 06:58:01,865 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 06:58:02,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-08-11 06:58:05,581 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-11 06:58:31,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.762e+01 3.121e+01 3.725e+01 5.735e+01, threshold=6.242e+01, percent-clipped=0.0 2024-08-11 06:58:32,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-08-11 06:58:42,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-11 06:58:42,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10350, loss[loss=0.11, beats_loss=0.0103, ecapa_loss=0.0002257, whisper_loss=0.09743, over 21685.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002064, whisper_loss=0.09395, over 3929435.06 frames. ], batch size: 88, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 06:58:43,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=973090.0, ans=0.1 2024-08-11 06:58:44,207 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 06:59:04,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=973190.0, ans=0.0 2024-08-11 06:59:14,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=973290.0, ans=0.2 2024-08-11 06:59:23,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-11 06:59:28,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=973390.0, ans=0.125 2024-08-11 06:59:35,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=973490.0, ans=0.07 2024-08-11 06:59:40,354 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 06:59:48,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10400, loss[loss=0.1152, beats_loss=0.01073, ecapa_loss=0.0002649, whisper_loss=0.1018, over 19250.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002062, whisper_loss=0.09382, over 3903698.76 frames. ], batch size: 79, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:00:00,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-08-11 07:00:06,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=973690.0, ans=0.1 2024-08-11 07:00:08,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2024-08-11 07:00:14,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=973790.0, ans=0.125 2024-08-11 07:00:42,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.630e+01 2.925e+01 3.255e+01 4.896e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-11 07:00:44,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=973990.0, ans=0.125 2024-08-11 07:00:50,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=973990.0, ans=0.2 2024-08-11 07:00:53,657 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10450, loss[loss=0.1073, beats_loss=0.01121, ecapa_loss=0.0002514, whisper_loss=0.09361, over 21020.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01149, ecapa_loss=0.0002061, whisper_loss=0.09342, over 3867862.87 frames. ], batch size: 89, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:00:54,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=974090.0, ans=0.125 2024-08-11 07:01:27,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=15.0 2024-08-11 07:01:54,693 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 07:02:02,219 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10500, loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.0002096, whisper_loss=0.09141, over 21065.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01152, ecapa_loss=0.0002041, whisper_loss=0.09315, over 3861396.42 frames. ], batch size: 83, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:02:08,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=974590.0, ans=0.05 2024-08-11 07:02:19,073 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-11 07:02:19,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=974690.0, ans=0.125 2024-08-11 07:02:25,756 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 07:02:27,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=974690.0, ans=0.125 2024-08-11 07:02:34,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=974790.0, ans=0.125 2024-08-11 07:02:50,111 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 07:02:50,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2024-08-11 07:02:57,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.661e+01 2.970e+01 3.368e+01 5.123e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-11 07:03:10,156 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10550, loss[loss=0.08703, beats_loss=0.0126, ecapa_loss=0.0002203, whisper_loss=0.07223, over 22349.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.000205, whisper_loss=0.09335, over 3843147.79 frames. ], batch size: 94, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:03:17,826 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 07:03:19,436 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 07:03:27,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=975190.0, ans=0.0 2024-08-11 07:03:37,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=975290.0, ans=0.125 2024-08-11 07:03:39,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=975290.0, ans=0.125 2024-08-11 07:03:59,050 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 07:04:05,898 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 07:04:09,177 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 07:04:16,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=975490.0, ans=0.0 2024-08-11 07:04:18,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10600, loss[loss=0.1168, beats_loss=0.0108, ecapa_loss=0.0001898, whisper_loss=0.1041, over 23655.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01152, ecapa_loss=0.0002051, whisper_loss=0.09341, over 3882666.31 frames. ], batch size: 89, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:04:23,466 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 07:04:27,588 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-11 07:04:29,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=975590.0, ans=0.125 2024-08-11 07:04:30,462 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 07:04:37,210 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 07:04:42,373 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-11 07:04:45,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=975790.0, ans=0.0 2024-08-11 07:04:59,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=975890.0, ans=0.0 2024-08-11 07:05:00,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=975890.0, ans=0.2 2024-08-11 07:05:11,060 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:05:11,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=12.0 2024-08-11 07:05:11,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.786e+01 3.038e+01 3.518e+01 8.413e+01, threshold=6.076e+01, percent-clipped=1.0 2024-08-11 07:05:23,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10650, loss[loss=0.1115, beats_loss=0.008691, ecapa_loss=0.0002403, whisper_loss=0.1004, over 14646.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01156, ecapa_loss=0.0002038, whisper_loss=0.0925, over 3871287.17 frames. ], batch size: 58, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:05:25,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=976090.0, ans=0.125 2024-08-11 07:05:41,277 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 07:05:50,640 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 07:05:57,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=976290.0, ans=0.2 2024-08-11 07:06:01,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=976290.0, ans=0.2 2024-08-11 07:06:02,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=976390.0, ans=0.1 2024-08-11 07:06:07,288 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 07:06:30,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10700, loss[loss=0.1231, beats_loss=0.009102, ecapa_loss=0.0001778, whisper_loss=0.1123, over 24845.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.0002039, whisper_loss=0.09343, over 3884711.21 frames. ], batch size: 93, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:06:59,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=976790.0, ans=0.0 2024-08-11 07:07:06,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=976790.0, ans=0.125 2024-08-11 07:07:23,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=976990.0, ans=0.0 2024-08-11 07:07:24,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.712e+01 3.090e+01 3.800e+01 9.134e+01, threshold=6.180e+01, percent-clipped=2.0 2024-08-11 07:07:35,145 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 07:07:36,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10750, loss[loss=0.1315, beats_loss=0.01164, ecapa_loss=0.0002289, whisper_loss=0.1175, over 21423.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01149, ecapa_loss=0.0002048, whisper_loss=0.09422, over 3902095.91 frames. ], batch size: 86, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:07:39,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=977090.0, ans=0.07 2024-08-11 07:07:48,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=977190.0, ans=0.125 2024-08-11 07:07:59,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=977190.0, ans=0.125 2024-08-11 07:08:02,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=977290.0, ans=0.125 2024-08-11 07:08:23,403 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 07:08:39,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=977490.0, ans=0.125 2024-08-11 07:08:41,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=977490.0, ans=0.125 2024-08-11 07:08:43,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10800, loss[loss=0.1026, beats_loss=0.01244, ecapa_loss=0.0001784, whisper_loss=0.08833, over 22532.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01154, ecapa_loss=0.0002028, whisper_loss=0.09434, over 3909808.45 frames. ], batch size: 90, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:08:52,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=977590.0, ans=0.1 2024-08-11 07:08:54,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=977590.0, ans=0.1 2024-08-11 07:09:30,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=977890.0, ans=0.025 2024-08-11 07:09:33,977 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 07:09:39,124 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.607e+01 2.912e+01 3.510e+01 6.638e+01, threshold=5.825e+01, percent-clipped=1.0 2024-08-11 07:09:46,527 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 07:09:49,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=977990.0, ans=0.125 2024-08-11 07:09:51,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10850, loss[loss=0.1225, beats_loss=0.01019, ecapa_loss=0.0002396, whisper_loss=0.11, over 22501.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01155, ecapa_loss=0.0002048, whisper_loss=0.0946, over 3920721.27 frames. ], batch size: 92, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:09:56,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=978090.0, ans=0.0 2024-08-11 07:10:10,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=978190.0, ans=0.0 2024-08-11 07:10:15,419 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 07:10:17,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978190.0, ans=0.1 2024-08-11 07:10:25,878 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 07:10:28,756 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 07:10:31,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=978390.0, ans=0.125 2024-08-11 07:10:35,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-11 07:10:39,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-11 07:10:42,012 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 07:10:59,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10900, loss[loss=0.1356, beats_loss=0.00925, ecapa_loss=0.0002193, whisper_loss=0.1241, over 23203.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01152, ecapa_loss=0.0002034, whisper_loss=0.09459, over 3934815.99 frames. ], batch size: 90, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:11:04,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2024-08-11 07:11:23,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=978690.0, ans=15.0 2024-08-11 07:11:28,493 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 07:11:34,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=978790.0, ans=0.5 2024-08-11 07:11:55,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.834e+01 3.154e+01 3.675e+01 5.808e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 07:11:55,532 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 07:12:01,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978990.0, ans=0.1 2024-08-11 07:12:02,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-11 07:12:05,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=978990.0, ans=0.0 2024-08-11 07:12:07,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 10950, loss[loss=0.1129, beats_loss=0.01294, ecapa_loss=0.000228, whisper_loss=0.09769, over 21550.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01141, ecapa_loss=0.0002036, whisper_loss=0.09499, over 3929512.29 frames. ], batch size: 89, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:12:30,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-11 07:12:34,943 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.835e-01 2024-08-11 07:12:42,263 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 07:12:57,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=979390.0, ans=0.0 2024-08-11 07:13:13,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11000, loss[loss=0.1014, beats_loss=0.01115, ecapa_loss=0.0002185, whisper_loss=0.08805, over 18289.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01148, ecapa_loss=0.0002038, whisper_loss=0.0943, over 3927677.48 frames. ], batch size: 74, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:13:18,704 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.69 vs. limit=6.0 2024-08-11 07:13:40,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=979790.0, ans=0.125 2024-08-11 07:13:49,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=979790.0, ans=0.07 2024-08-11 07:14:08,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.630e+01 2.984e+01 3.392e+01 5.712e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 07:14:10,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=15.0 2024-08-11 07:14:19,813 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 07:14:20,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11050, loss[loss=0.1191, beats_loss=0.009494, ecapa_loss=0.0002382, whisper_loss=0.1072, over 20899.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01141, ecapa_loss=0.0002067, whisper_loss=0.09429, over 3929199.83 frames. ], batch size: 84, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:14:29,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=980090.0, ans=0.125 2024-08-11 07:14:38,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=980190.0, ans=0.2 2024-08-11 07:14:42,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=980190.0, ans=0.125 2024-08-11 07:14:43,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=980190.0, ans=0.125 2024-08-11 07:14:43,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=980190.0, ans=0.0 2024-08-11 07:14:53,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=980290.0, ans=0.2 2024-08-11 07:14:56,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=980290.0, ans=0.125 2024-08-11 07:15:04,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=980390.0, ans=0.0 2024-08-11 07:15:10,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=980390.0, ans=0.125 2024-08-11 07:15:26,587 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 07:15:28,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11100, loss[loss=0.09811, beats_loss=0.01243, ecapa_loss=0.0002106, whisper_loss=0.08357, over 23107.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01136, ecapa_loss=0.0002078, whisper_loss=0.09431, over 3950872.97 frames. ], batch size: 94, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:15:28,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2024-08-11 07:15:35,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-11 07:16:23,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.722e+01 3.049e+01 3.591e+01 6.029e+01, threshold=6.098e+01, percent-clipped=1.0 2024-08-11 07:16:25,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=980990.0, ans=0.125 2024-08-11 07:16:36,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11150, loss[loss=0.08955, beats_loss=0.01113, ecapa_loss=0.0002075, whisper_loss=0.07634, over 15765.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01126, ecapa_loss=0.0002083, whisper_loss=0.09508, over 3926657.15 frames. ], batch size: 61, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:16:36,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981090.0, ans=0.1 2024-08-11 07:16:42,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=981090.0, ans=0.0 2024-08-11 07:16:49,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=981190.0, ans=0.125 2024-08-11 07:16:50,674 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:16:50,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=981190.0, ans=0.1 2024-08-11 07:16:51,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=981190.0, ans=0.0 2024-08-11 07:16:54,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=981190.0, ans=0.0 2024-08-11 07:17:00,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-11 07:17:02,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=981290.0, ans=0.0 2024-08-11 07:17:07,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-11 07:17:31,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=981490.0, ans=0.125 2024-08-11 07:17:35,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=981490.0, ans=0.125 2024-08-11 07:17:35,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=981490.0, ans=0.125 2024-08-11 07:17:41,396 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 07:17:43,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11200, loss[loss=0.07491, beats_loss=0.01462, ecapa_loss=0.0001617, whisper_loss=0.05867, over 15903.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0002075, whisper_loss=0.09404, over 3873346.34 frames. ], batch size: 63, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:17:50,515 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 07:18:00,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2024-08-11 07:18:03,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=981690.0, ans=0.0 2024-08-11 07:18:10,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=981790.0, ans=0.125 2024-08-11 07:18:17,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=981790.0, ans=0.125 2024-08-11 07:18:34,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=981890.0, ans=0.125 2024-08-11 07:18:35,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=981890.0, ans=0.0 2024-08-11 07:18:39,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.676e+01 2.993e+01 3.397e+01 5.977e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 07:18:40,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=981990.0, ans=0.125 2024-08-11 07:18:42,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=981990.0, ans=0.125 2024-08-11 07:18:43,233 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 07:18:50,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-08-11 07:18:51,710 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11250, loss[loss=0.09963, beats_loss=0.01038, ecapa_loss=0.0001866, whisper_loss=0.08739, over 15532.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01124, ecapa_loss=0.0002086, whisper_loss=0.09432, over 3847904.37 frames. ], batch size: 59, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:19:01,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2024-08-11 07:19:12,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=982190.0, ans=0.1 2024-08-11 07:19:15,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=982190.0, ans=0.2 2024-08-11 07:19:15,977 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 07:19:25,142 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 07:19:31,227 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 07:19:47,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.88 vs. limit=15.0 2024-08-11 07:19:59,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11300, loss[loss=0.09329, beats_loss=0.01085, ecapa_loss=0.0001921, whisper_loss=0.08052, over 16100.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0113, ecapa_loss=0.0002069, whisper_loss=0.09441, over 3879012.27 frames. ], batch size: 64, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:20:15,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=982690.0, ans=0.1 2024-08-11 07:20:15,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-08-11 07:20:19,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=982690.0, ans=0.1 2024-08-11 07:20:20,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=982690.0, ans=0.1 2024-08-11 07:20:37,407 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 07:20:50,304 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 07:20:53,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.719e+01 3.008e+01 3.388e+01 1.679e+02, threshold=6.016e+01, percent-clipped=1.0 2024-08-11 07:21:05,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11350, loss[loss=0.1062, beats_loss=0.009905, ecapa_loss=0.0002424, whisper_loss=0.09386, over 16021.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01127, ecapa_loss=0.0002071, whisper_loss=0.09469, over 3864979.24 frames. ], batch size: 67, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:21:20,924 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 07:21:21,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=983190.0, ans=0.125 2024-08-11 07:21:24,664 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 07:21:32,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=983290.0, ans=0.09899494936611666 2024-08-11 07:21:36,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=983290.0, ans=0.125 2024-08-11 07:21:45,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=22.5 2024-08-11 07:21:59,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=983490.0, ans=0.07 2024-08-11 07:22:10,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11400, loss[loss=0.1196, beats_loss=0.01238, ecapa_loss=0.0001855, whisper_loss=0.1053, over 22661.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01139, ecapa_loss=0.0002063, whisper_loss=0.09476, over 3887654.05 frames. ], batch size: 91, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:22:10,319 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 34 from Vox, 37 fro AS 2024-08-11 07:22:30,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=983690.0, ans=0.0 2024-08-11 07:22:31,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=983690.0, ans=0.05 2024-08-11 07:22:33,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=983690.0, ans=0.125 2024-08-11 07:22:34,858 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 07:22:35,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-11 07:22:46,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2024-08-11 07:23:02,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.896e+01 3.252e+01 3.905e+01 6.465e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-11 07:23:07,673 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 07:23:08,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=983990.0, ans=0.015 2024-08-11 07:23:10,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=983990.0, ans=0.0 2024-08-11 07:23:13,924 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11450, loss[loss=0.1066, beats_loss=0.01226, ecapa_loss=0.0001924, whisper_loss=0.09239, over 23304.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01142, ecapa_loss=0.000207, whisper_loss=0.09438, over 3918703.52 frames. ], batch size: 94, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:23:14,405 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:23:15,371 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 07:23:20,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=984090.0, ans=0.125 2024-08-11 07:24:00,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=984390.0, ans=0.125 2024-08-11 07:24:00,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=984390.0, ans=0.0 2024-08-11 07:24:07,769 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 07:24:08,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.03 vs. limit=22.5 2024-08-11 07:24:09,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=984490.0, ans=0.07 2024-08-11 07:24:16,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=984490.0, ans=0.2 2024-08-11 07:24:23,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11500, loss[loss=0.1044, beats_loss=0.009626, ecapa_loss=0.0001868, whisper_loss=0.09293, over 16572.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01139, ecapa_loss=0.0002064, whisper_loss=0.09429, over 3912454.90 frames. ], batch size: 66, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:24:24,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=984590.0, ans=0.125 2024-08-11 07:24:44,120 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:25:13,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.33 vs. limit=22.5 2024-08-11 07:25:15,415 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 07:25:25,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=984890.0, ans=0.0 2024-08-11 07:25:25,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=984890.0, ans=15.0 2024-08-11 07:25:41,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2024-08-11 07:25:43,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=984990.0, ans=0.125 2024-08-11 07:25:45,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.765e+01 3.010e+01 3.592e+01 5.034e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-11 07:25:47,939 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 07:25:49,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2024-08-11 07:25:51,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984990.0, ans=0.1 2024-08-11 07:26:02,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=985090.0, ans=0.0 2024-08-11 07:26:02,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985090.0, ans=0.1 2024-08-11 07:26:03,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11550, loss[loss=0.1084, beats_loss=0.01293, ecapa_loss=0.0001842, whisper_loss=0.09366, over 22142.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01132, ecapa_loss=0.0002067, whisper_loss=0.09551, over 3920004.04 frames. ], batch size: 90, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:26:10,968 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 07:26:14,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-11 07:26:22,039 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 07:26:23,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=985190.0, ans=0.125 2024-08-11 07:26:27,664 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 07:27:28,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2024-08-11 07:27:29,906 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 07:27:35,392 INFO [train_multi_KD3.py:844] (3/4) A total of 99 cuts. 29 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-11 07:27:50,213 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 07:27:52,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11600, loss[loss=0.111, beats_loss=0.01223, ecapa_loss=0.0002031, whisper_loss=0.09674, over 22147.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01136, ecapa_loss=0.0002066, whisper_loss=0.09522, over 3903681.94 frames. ], batch size: 92, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:27:57,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=985590.0, ans=0.0 2024-08-11 07:28:14,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985590.0, ans=0.1 2024-08-11 07:28:15,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=985590.0, ans=22.5 2024-08-11 07:28:15,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-11 07:28:19,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=985690.0, ans=0.125 2024-08-11 07:28:29,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=985690.0, ans=0.0 2024-08-11 07:28:37,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-11 07:29:24,749 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 07:29:29,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.587e+01 2.898e+01 3.413e+01 5.144e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-11 07:29:33,634 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-11 07:29:38,825 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 07:29:44,349 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11650, loss[loss=0.1062, beats_loss=0.01165, ecapa_loss=0.0002206, whisper_loss=0.09229, over 22501.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01138, ecapa_loss=0.0002074, whisper_loss=0.09439, over 3929287.58 frames. ], batch size: 91, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:29:55,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2024-08-11 07:30:20,220 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 18 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 07:30:25,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=986290.0, ans=0.0 2024-08-11 07:30:36,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-11 07:30:48,823 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 07:30:56,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-08-11 07:31:03,238 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 07:31:13,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11700, loss[loss=0.1273, beats_loss=0.009832, ecapa_loss=0.0001989, whisper_loss=0.1155, over 17955.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01146, ecapa_loss=0.0002062, whisper_loss=0.09379, over 3935134.88 frames. ], batch size: 66, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:31:25,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2024-08-11 07:31:51,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=986790.0, ans=0.125 2024-08-11 07:31:52,351 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 07:32:06,972 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-11 07:32:24,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.842e+01 3.149e+01 3.845e+01 7.778e+01, threshold=6.297e+01, percent-clipped=3.0 2024-08-11 07:32:26,274 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.324e+02 2024-08-11 07:32:39,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11750, loss[loss=0.09806, beats_loss=0.01046, ecapa_loss=0.0002081, whisper_loss=0.08552, over 17163.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01149, ecapa_loss=0.000206, whisper_loss=0.0938, over 3944249.55 frames. ], batch size: 66, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:33:40,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987390.0, ans=0.1 2024-08-11 07:33:44,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=987390.0, ans=0.0 2024-08-11 07:33:47,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2024-08-11 07:33:53,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=987490.0, ans=0.0 2024-08-11 07:33:54,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=987490.0, ans=0.0 2024-08-11 07:33:54,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=987490.0, ans=0.125 2024-08-11 07:34:04,729 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 07:34:08,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=987590.0, ans=0.0 2024-08-11 07:34:09,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11800, loss[loss=0.1149, beats_loss=0.01135, ecapa_loss=0.0002178, whisper_loss=0.1014, over 22447.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01149, ecapa_loss=0.0002063, whisper_loss=0.09393, over 3933412.79 frames. ], batch size: 89, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:34:20,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=987590.0, ans=0.07 2024-08-11 07:34:22,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=987590.0, ans=0.125 2024-08-11 07:34:39,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-11 07:34:50,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=987790.0, ans=0.0 2024-08-11 07:34:52,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=987790.0, ans=0.0 2024-08-11 07:35:02,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=987890.0, ans=0.0 2024-08-11 07:35:18,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.720e+01 3.073e+01 3.423e+01 3.198e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 07:35:23,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=987990.0, ans=0.125 2024-08-11 07:35:24,731 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 07:35:36,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11850, loss[loss=0.1031, beats_loss=0.01252, ecapa_loss=0.0001924, whisper_loss=0.08861, over 17781.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01155, ecapa_loss=0.000204, whisper_loss=0.09439, over 3946227.41 frames. ], batch size: 69, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:35:41,640 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 07:36:16,089 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 07:36:24,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.14 vs. limit=6.0 2024-08-11 07:36:51,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=988490.0, ans=0.125 2024-08-11 07:36:53,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988490.0, ans=0.1 2024-08-11 07:36:59,785 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:37:01,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11900, loss[loss=0.1322, beats_loss=0.01059, ecapa_loss=0.0001965, whisper_loss=0.1197, over 23960.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01159, ecapa_loss=0.0002039, whisper_loss=0.0943, over 3948251.59 frames. ], batch size: 92, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:37:08,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=988590.0, ans=0.125 2024-08-11 07:37:21,669 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 07:37:23,299 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 07:38:05,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=988990.0, ans=0.125 2024-08-11 07:38:06,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.737e+01 3.168e+01 3.571e+01 8.955e+01, threshold=6.335e+01, percent-clipped=2.0 2024-08-11 07:38:19,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=989090.0, ans=0.0 2024-08-11 07:38:20,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 11950, loss[loss=0.1049, beats_loss=0.01157, ecapa_loss=0.0002171, whisper_loss=0.09113, over 22254.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01154, ecapa_loss=0.0002049, whisper_loss=0.09424, over 3910407.28 frames. ], batch size: 92, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:38:22,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=989090.0, ans=0.0 2024-08-11 07:38:22,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=989090.0, ans=22.5 2024-08-11 07:38:26,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=989090.0, ans=0.125 2024-08-11 07:38:29,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=989090.0, ans=0.0 2024-08-11 07:38:32,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=989090.0, ans=0.125 2024-08-11 07:38:32,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=989090.0, ans=0.025 2024-08-11 07:38:33,700 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 07:38:33,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=989090.0, ans=0.125 2024-08-11 07:38:36,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=989190.0, ans=0.0 2024-08-11 07:38:38,915 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-11 07:38:40,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=989190.0, ans=0.0 2024-08-11 07:38:47,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=989190.0, ans=0.125 2024-08-11 07:39:02,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=989290.0, ans=0.0 2024-08-11 07:39:10,264 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 07:39:11,818 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 07:39:19,614 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 07:39:30,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=989490.0, ans=0.125 2024-08-11 07:39:37,876 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12000, loss[loss=0.08812, beats_loss=0.01357, ecapa_loss=0.0002631, whisper_loss=0.07191, over 21175.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01148, ecapa_loss=0.0002058, whisper_loss=0.0943, over 3867645.57 frames. ], batch size: 94, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:39:37,877 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 07:40:13,127 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006674, whisper_loss=0.252, over 922467.00 frames. 2024-08-11 07:40:32,458 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on SV_voxceleb1: loss=0.005495, beats_loss=0, ecapa_loss=0.0005495, whisper_loss=0, over 939242.00 frames. 2024-08-11 07:42:18,255 INFO [train_multi_KD3.py:1149] (3/4) Epoch 7, validation on AT_audioset: loss=0.02554, beats_loss=0.02554, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 07:42:18,259 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 07:42:24,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.52 vs. limit=10.0 2024-08-11 07:42:40,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-11 07:42:44,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=989690.0, ans=0.0 2024-08-11 07:42:57,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=989790.0, ans=0.0 2024-08-11 07:43:12,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=989890.0, ans=0.0 2024-08-11 07:43:12,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.36 vs. limit=12.0 2024-08-11 07:43:14,637 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 07:43:21,394 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.747e+01 3.219e+01 3.881e+01 9.695e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 07:43:26,721 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 07:43:35,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12050, loss[loss=0.1077, beats_loss=0.01015, ecapa_loss=0.0001989, whisper_loss=0.09556, over 23562.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01138, ecapa_loss=0.0002069, whisper_loss=0.09386, over 3857852.41 frames. ], batch size: 94, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:43:40,657 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 07:43:45,661 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 07:44:03,566 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 07:44:23,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=990390.0, ans=0.125 2024-08-11 07:44:27,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=990390.0, ans=0.0 2024-08-11 07:44:30,341 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 07:44:44,838 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 07:44:50,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12100, loss[loss=0.1176, beats_loss=0.01057, ecapa_loss=0.0002344, whisper_loss=0.1046, over 16226.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01153, ecapa_loss=0.0002061, whisper_loss=0.09312, over 3867028.28 frames. ], batch size: 65, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:45:02,392 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 07:45:16,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=990690.0, ans=0.09899494936611666 2024-08-11 07:45:21,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=990790.0, ans=0.0 2024-08-11 07:45:21,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-11 07:45:40,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=990890.0, ans=0.0 2024-08-11 07:45:54,929 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.799e+01 3.089e+01 3.650e+01 5.391e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 07:45:57,041 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 07:46:04,414 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.600e-01 2024-08-11 07:46:10,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12150, loss[loss=0.1347, beats_loss=0.009919, ecapa_loss=0.0001648, whisper_loss=0.1231, over 21268.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01149, ecapa_loss=0.0002065, whisper_loss=0.09294, over 3864105.91 frames. ], batch size: 79, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:46:28,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=991190.0, ans=0.0 2024-08-11 07:46:32,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991190.0, ans=0.1 2024-08-11 07:46:38,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=991190.0, ans=0.125 2024-08-11 07:46:53,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=991290.0, ans=0.1 2024-08-11 07:46:54,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=991290.0, ans=0.1 2024-08-11 07:47:13,429 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 07:47:13,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=991490.0, ans=0.125 2024-08-11 07:47:19,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2024-08-11 07:47:23,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2024-08-11 07:47:30,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12200, loss[loss=0.1172, beats_loss=0.009501, ecapa_loss=0.0002371, whisper_loss=0.1054, over 22359.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01141, ecapa_loss=0.0002078, whisper_loss=0.09294, over 3853963.76 frames. ], batch size: 89, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:47:42,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=991590.0, ans=10.0 2024-08-11 07:47:48,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=991690.0, ans=0.125 2024-08-11 07:47:54,246 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 07:48:03,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=991790.0, ans=0.0 2024-08-11 07:48:09,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=991790.0, ans=0.125 2024-08-11 07:48:12,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=991790.0, ans=0.125 2024-08-11 07:48:12,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2024-08-11 07:48:17,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=991890.0, ans=0.125 2024-08-11 07:48:17,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=991890.0, ans=0.0 2024-08-11 07:48:29,930 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-11 07:48:31,182 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 07:48:34,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=991990.0, ans=0.125 2024-08-11 07:48:35,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.629e+01 2.882e+01 3.326e+01 5.595e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 07:48:49,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12250, loss[loss=0.1197, beats_loss=0.009876, ecapa_loss=0.0002129, whisper_loss=0.1077, over 22823.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01148, ecapa_loss=0.0002063, whisper_loss=0.09276, over 3828804.02 frames. ], batch size: 90, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:48:59,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=992090.0, ans=0.125 2024-08-11 07:49:05,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=992190.0, ans=0.125 2024-08-11 07:49:12,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=992190.0, ans=0.09899494936611666 2024-08-11 07:49:15,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=992190.0, ans=0.125 2024-08-11 07:49:18,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2024-08-11 07:49:30,340 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 07:49:33,333 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 07:49:34,679 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 07:49:37,572 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 12 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 07:49:46,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=992390.0, ans=0.0 2024-08-11 07:49:52,274 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 40 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 07:50:08,671 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12300, loss[loss=0.1032, beats_loss=0.01393, ecapa_loss=0.0001708, whisper_loss=0.08756, over 19222.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01137, ecapa_loss=0.0002061, whisper_loss=0.09323, over 3841209.86 frames. ], batch size: 77, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:50:13,263 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 07:50:35,167 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 07:50:39,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992690.0, ans=0.1 2024-08-11 07:50:49,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-11 07:50:50,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=992790.0, ans=15.0 2024-08-11 07:50:58,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992890.0, ans=0.1 2024-08-11 07:51:09,846 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 07:51:12,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.794e+01 3.118e+01 3.585e+01 7.136e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-11 07:51:27,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12350, loss[loss=0.1377, beats_loss=0.007581, ecapa_loss=0.0002098, whisper_loss=0.128, over 18681.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01137, ecapa_loss=0.0002054, whisper_loss=0.09372, over 3854233.13 frames. ], batch size: 73, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:51:31,879 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 07:51:48,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=993190.0, ans=0.125 2024-08-11 07:51:50,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=993190.0, ans=0.125 2024-08-11 07:51:53,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=993190.0, ans=0.1 2024-08-11 07:52:16,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993390.0, ans=0.1 2024-08-11 07:52:16,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=993390.0, ans=0.125 2024-08-11 07:52:21,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-11 07:52:31,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-08-11 07:52:41,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12400, loss[loss=0.1083, beats_loss=0.01096, ecapa_loss=0.0001789, whisper_loss=0.09554, over 23083.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0114, ecapa_loss=0.0002035, whisper_loss=0.09331, over 3867016.55 frames. ], batch size: 89, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:52:43,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=993590.0, ans=0.0 2024-08-11 07:52:54,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=993590.0, ans=0.125 2024-08-11 07:53:04,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993690.0, ans=0.1 2024-08-11 07:53:25,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=12.0 2024-08-11 07:53:30,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=993890.0, ans=0.2 2024-08-11 07:53:47,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.944e+01 3.370e+01 3.888e+01 6.179e+01, threshold=6.739e+01, percent-clipped=0.0 2024-08-11 07:53:56,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=993990.0, ans=0.125 2024-08-11 07:54:01,525 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12450, loss[loss=0.1123, beats_loss=0.0122, ecapa_loss=0.0002321, whisper_loss=0.09777, over 18239.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01132, ecapa_loss=0.0002046, whisper_loss=0.0936, over 3903524.68 frames. ], batch size: 76, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:54:12,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.02 vs. limit=22.5 2024-08-11 07:54:13,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=994090.0, ans=0.2 2024-08-11 07:54:17,809 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 07:54:30,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=994190.0, ans=0.07 2024-08-11 07:54:36,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=994290.0, ans=0.0 2024-08-11 07:54:55,570 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 07:54:56,810 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 07:54:59,816 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 07:55:19,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12500, loss[loss=0.1253, beats_loss=0.01037, ecapa_loss=0.0001811, whisper_loss=0.1131, over 22582.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01132, ecapa_loss=0.0002047, whisper_loss=0.09372, over 3898641.22 frames. ], batch size: 85, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:55:22,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=994590.0, ans=0.125 2024-08-11 07:55:59,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=994790.0, ans=0.2 2024-08-11 07:56:09,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=994890.0, ans=0.1 2024-08-11 07:56:23,237 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.789e+01 3.126e+01 3.797e+01 5.980e+01, threshold=6.252e+01, percent-clipped=0.0 2024-08-11 07:56:30,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=994990.0, ans=0.125 2024-08-11 07:56:34,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-11 07:56:37,050 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12550, loss[loss=0.1134, beats_loss=0.01097, ecapa_loss=0.0001918, whisper_loss=0.1005, over 14054.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01139, ecapa_loss=0.0002039, whisper_loss=0.09397, over 3917344.62 frames. ], batch size: 56, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:56:48,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=995090.0, ans=0.125 2024-08-11 07:57:17,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2024-08-11 07:57:26,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=995390.0, ans=0.2 2024-08-11 07:57:40,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-11 07:57:56,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12600, loss[loss=0.1118, beats_loss=0.01012, ecapa_loss=0.000183, whisper_loss=0.09988, over 15204.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01139, ecapa_loss=0.000205, whisper_loss=0.09408, over 3920913.31 frames. ], batch size: 58, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:58:13,556 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 07:58:17,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=995690.0, ans=0.125 2024-08-11 07:58:41,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=995790.0, ans=0.125 2024-08-11 07:58:44,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=995890.0, ans=0.125 2024-08-11 07:58:47,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=995890.0, ans=0.1 2024-08-11 07:58:53,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=995890.0, ans=0.125 2024-08-11 07:59:00,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.576e+01 3.023e+01 3.555e+01 7.578e+01, threshold=6.047e+01, percent-clipped=3.0 2024-08-11 07:59:04,757 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 07:59:17,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12650, loss[loss=0.1201, beats_loss=0.009224, ecapa_loss=0.0002333, whisper_loss=0.1086, over 21821.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01141, ecapa_loss=0.0002062, whisper_loss=0.09355, over 3941794.86 frames. ], batch size: 84, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:59:28,789 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 35 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 07:59:38,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=996190.0, ans=0.0 2024-08-11 08:00:08,571 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 08:00:19,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=996390.0, ans=0.125 2024-08-11 08:00:21,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-11 08:00:42,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12700, loss[loss=0.1233, beats_loss=0.01038, ecapa_loss=0.0001919, whisper_loss=0.111, over 17390.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01146, ecapa_loss=0.0002052, whisper_loss=0.09339, over 3927578.97 frames. ], batch size: 67, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:00:56,491 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 08:01:12,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=996690.0, ans=0.125 2024-08-11 08:01:37,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=996890.0, ans=0.125 2024-08-11 08:01:42,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-11 08:01:53,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.625e+01 2.937e+01 3.351e+01 6.413e+01, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 08:01:58,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=996990.0, ans=0.2 2024-08-11 08:01:58,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=996990.0, ans=0.1 2024-08-11 08:02:00,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=996990.0, ans=0.125 2024-08-11 08:02:09,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2024-08-11 08:02:10,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12750, loss[loss=0.08981, beats_loss=0.01164, ecapa_loss=0.0002549, whisper_loss=0.07562, over 18257.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01148, ecapa_loss=0.0002052, whisper_loss=0.09355, over 3939715.59 frames. ], batch size: 79, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:02:10,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=997090.0, ans=0.2 2024-08-11 08:02:13,205 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 08:02:25,043 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-11 08:02:54,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=997290.0, ans=0.1 2024-08-11 08:02:57,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=12.0 2024-08-11 08:03:16,655 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 08:03:18,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=997490.0, ans=0.1 2024-08-11 08:03:25,384 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 23 from Vox, 15 fro AS 2024-08-11 08:03:29,675 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 08:03:32,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=997590.0, ans=0.125 2024-08-11 08:03:32,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12800, loss[loss=0.1186, beats_loss=0.008918, ecapa_loss=0.0002277, whisper_loss=0.1074, over 20504.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01154, ecapa_loss=0.0002057, whisper_loss=0.09339, over 3954330.01 frames. ], batch size: 81, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:03:47,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=997590.0, ans=0.125 2024-08-11 08:04:11,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=15.0 2024-08-11 08:04:40,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2024-08-11 08:04:42,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.631e+01 3.014e+01 3.452e+01 5.658e+01, threshold=6.028e+01, percent-clipped=0.0 2024-08-11 08:04:44,657 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 08:04:52,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=997990.0, ans=0.0 2024-08-11 08:04:56,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12850, loss[loss=0.09017, beats_loss=0.0129, ecapa_loss=0.000202, whisper_loss=0.07526, over 14929.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01151, ecapa_loss=0.0002069, whisper_loss=0.09327, over 3935550.84 frames. ], batch size: 62, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:04:57,482 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 08:05:07,654 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 08:05:18,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=998190.0, ans=0.125 2024-08-11 08:05:54,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=998390.0, ans=0.0 2024-08-11 08:06:06,742 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 08:06:13,117 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 08:06:17,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12900, loss[loss=0.1156, beats_loss=0.0117, ecapa_loss=0.0002072, whisper_loss=0.1019, over 22302.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01153, ecapa_loss=0.0002069, whisper_loss=0.09293, over 3906773.15 frames. ], batch size: 89, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:06:18,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998590.0, ans=0.1 2024-08-11 08:06:42,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=998690.0, ans=0.2 2024-08-11 08:07:04,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2024-08-11 08:07:06,416 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-11 08:07:06,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=998890.0, ans=0.0 2024-08-11 08:07:12,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=998890.0, ans=0.125 2024-08-11 08:07:24,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.613e+01 2.962e+01 3.305e+01 5.857e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 08:07:25,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=998990.0, ans=0.04949747468305833 2024-08-11 08:07:37,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-11 08:07:41,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 12950, loss[loss=0.09971, beats_loss=0.01077, ecapa_loss=0.0002276, whisper_loss=0.08666, over 16038.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01149, ecapa_loss=0.0002058, whisper_loss=0.09251, over 3889238.96 frames. ], batch size: 65, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:07:57,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-08-11 08:08:28,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=999290.0, ans=0.125 2024-08-11 08:08:33,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=999390.0, ans=0.125 2024-08-11 08:08:44,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=15.0 2024-08-11 08:08:48,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=999390.0, ans=0.125 2024-08-11 08:08:51,330 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:09:11,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13000, loss[loss=0.08842, beats_loss=0.01328, ecapa_loss=0.0002053, whisper_loss=0.07309, over 20115.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01144, ecapa_loss=0.0002056, whisper_loss=0.09319, over 3920727.33 frames. ], batch size: 81, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:09:40,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=999690.0, ans=0.0 2024-08-11 08:09:42,117 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 08:09:44,145 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 26 from Vox, 18 fro AS 2024-08-11 08:09:47,637 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 08:09:47,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=999790.0, ans=0.0 2024-08-11 08:09:59,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=999790.0, ans=0.0 2024-08-11 08:10:17,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=999890.0, ans=0.125 2024-08-11 08:10:25,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.746e+01 3.044e+01 3.535e+01 5.645e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:10:25,237 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 08:10:39,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13050, loss[loss=0.1228, beats_loss=0.01131, ecapa_loss=0.0001675, whisper_loss=0.1098, over 20199.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01142, ecapa_loss=0.0002041, whisper_loss=0.09341, over 3901381.49 frames. ], batch size: 75, lr: 8.74e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:10:43,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2024-08-11 08:10:44,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000090.0, ans=0.1 2024-08-11 08:10:53,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1000090.0, ans=0.1 2024-08-11 08:10:58,221 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 08:11:40,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1000490.0, ans=0.125 2024-08-11 08:11:49,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1000490.0, ans=0.125 2024-08-11 08:11:55,738 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13100, loss[loss=0.09642, beats_loss=0.01139, ecapa_loss=0.0002392, whisper_loss=0.08263, over 23001.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01144, ecapa_loss=0.000205, whisper_loss=0.09264, over 3881498.73 frames. ], batch size: 94, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:11:58,836 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 08:12:06,379 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 08:12:19,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1000690.0, ans=0.125 2024-08-11 08:12:29,569 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 08:12:32,908 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 08:12:49,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1000890.0, ans=0.09899494936611666 2024-08-11 08:12:51,000 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 08:12:54,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.920e+01 3.431e+01 3.898e+01 1.839e+02, threshold=6.862e+01, percent-clipped=3.0 2024-08-11 08:13:01,176 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 08:13:07,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13150, loss[loss=0.1096, beats_loss=0.01254, ecapa_loss=0.0002009, whisper_loss=0.09507, over 21866.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01148, ecapa_loss=0.0002037, whisper_loss=0.0927, over 3871849.85 frames. ], batch size: 90, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:13:15,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1001090.0, ans=0.125 2024-08-11 08:13:29,005 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 08:13:38,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1001290.0, ans=0.09899494936611666 2024-08-11 08:14:02,976 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 08:14:03,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1001390.0, ans=0.125 2024-08-11 08:14:17,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1001490.0, ans=0.125 2024-08-11 08:14:20,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13200, loss[loss=0.124, beats_loss=0.00966, ecapa_loss=0.0002234, whisper_loss=0.1121, over 17015.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01135, ecapa_loss=0.0002047, whisper_loss=0.09348, over 3860179.65 frames. ], batch size: 70, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:14:22,881 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 27 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-11 08:14:37,027 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 10 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 08:15:00,291 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 08:15:07,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-11 08:15:16,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1001890.0, ans=15.0 2024-08-11 08:15:22,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.762e+01 3.091e+01 3.560e+01 4.785e+01, threshold=6.182e+01, percent-clipped=0.0 2024-08-11 08:15:26,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-11 08:15:33,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1001990.0, ans=0.125 2024-08-11 08:15:36,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13250, loss[loss=0.1012, beats_loss=0.01119, ecapa_loss=0.0002622, whisper_loss=0.08742, over 18796.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01138, ecapa_loss=0.0002055, whisper_loss=0.09329, over 3851291.27 frames. ], batch size: 78, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:15:39,887 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 08:15:41,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1002090.0, ans=0.2 2024-08-11 08:15:49,117 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-11 08:15:59,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-11 08:16:04,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1002190.0, ans=0.125 2024-08-11 08:16:09,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1002290.0, ans=0.0 2024-08-11 08:16:42,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-11 08:16:46,353 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 08:16:47,522 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 08:16:51,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13300, loss[loss=0.08117, beats_loss=0.01239, ecapa_loss=0.0002807, whisper_loss=0.06597, over 19696.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01139, ecapa_loss=0.0002065, whisper_loss=0.09323, over 3853393.24 frames. ], batch size: 90, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:17:04,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-11 08:17:50,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1002890.0, ans=0.1 2024-08-11 08:17:55,194 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.836e-02 2024-08-11 08:17:55,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.657e+01 3.097e+01 3.589e+01 1.012e+02, threshold=6.194e+01, percent-clipped=1.0 2024-08-11 08:18:10,269 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13350, loss[loss=0.1007, beats_loss=0.01091, ecapa_loss=0.0002019, whisper_loss=0.08774, over 20145.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0114, ecapa_loss=0.0002066, whisper_loss=0.09372, over 3886113.16 frames. ], batch size: 77, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:18:17,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1003090.0, ans=0.125 2024-08-11 08:18:31,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1003190.0, ans=0.0 2024-08-11 08:18:37,946 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 08:18:48,206 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 08:18:50,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-08-11 08:18:57,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1003390.0, ans=0.1 2024-08-11 08:18:59,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1003390.0, ans=0.125 2024-08-11 08:19:11,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1003490.0, ans=0.0 2024-08-11 08:19:15,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1003490.0, ans=0.0 2024-08-11 08:19:17,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1003490.0, ans=0.125 2024-08-11 08:19:29,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13400, loss[loss=0.09826, beats_loss=0.01131, ecapa_loss=0.0001902, whisper_loss=0.08505, over 17158.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01137, ecapa_loss=0.0002054, whisper_loss=0.09378, over 3883232.31 frames. ], batch size: 66, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:19:33,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1003590.0, ans=0.125 2024-08-11 08:19:39,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=12.0 2024-08-11 08:19:49,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1003690.0, ans=0.0 2024-08-11 08:20:07,442 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 08:20:34,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.700e+01 3.139e+01 3.511e+01 8.019e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-11 08:20:36,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1003990.0, ans=0.0 2024-08-11 08:20:39,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1003990.0, ans=0.0 2024-08-11 08:20:47,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13450, loss[loss=0.1074, beats_loss=0.01284, ecapa_loss=0.0001858, whisper_loss=0.09265, over 23400.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01137, ecapa_loss=0.0002064, whisper_loss=0.09383, over 3939911.06 frames. ], batch size: 94, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:20:54,572 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 08:21:05,433 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 08:21:18,283 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 08:21:42,213 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 08:21:43,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.33 vs. limit=15.0 2024-08-11 08:21:48,922 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 08:22:05,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13500, loss[loss=0.08337, beats_loss=0.01392, ecapa_loss=0.0001783, whisper_loss=0.06767, over 14712.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01135, ecapa_loss=0.0002053, whisper_loss=0.09442, over 3913149.96 frames. ], batch size: 61, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:22:21,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2024-08-11 08:22:35,170 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 34 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 08:22:44,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1004790.0, ans=0.1 2024-08-11 08:23:04,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.720e+01 3.065e+01 3.481e+01 5.636e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-11 08:23:18,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13550, loss[loss=0.09004, beats_loss=0.01298, ecapa_loss=0.0002076, whisper_loss=0.07498, over 17526.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01143, ecapa_loss=0.000205, whisper_loss=0.09369, over 3883146.53 frames. ], batch size: 71, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:23:21,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1005090.0, ans=0.125 2024-08-11 08:23:22,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-11 08:23:42,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1005190.0, ans=0.0 2024-08-11 08:23:43,442 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 08:24:26,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1005490.0, ans=0.0 2024-08-11 08:24:28,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1005490.0, ans=0.125 2024-08-11 08:24:32,056 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13600, loss[loss=0.1074, beats_loss=0.01011, ecapa_loss=0.0002743, whisper_loss=0.09452, over 14668.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01151, ecapa_loss=0.0002047, whisper_loss=0.09355, over 3867184.63 frames. ], batch size: 60, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:25:02,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1005790.0, ans=0.1 2024-08-11 08:25:04,660 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 08:25:07,868 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 08:25:08,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1005790.0, ans=0.125 2024-08-11 08:25:18,681 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 08:25:21,602 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 08:25:26,366 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 08:25:26,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1005890.0, ans=0.0 2024-08-11 08:25:31,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.811e+01 3.158e+01 3.669e+01 1.616e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 08:25:44,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13650, loss[loss=0.1154, beats_loss=0.01082, ecapa_loss=0.0002579, whisper_loss=0.102, over 21214.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0115, ecapa_loss=0.0002071, whisper_loss=0.09375, over 3883911.41 frames. ], batch size: 89, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:26:15,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.46 vs. limit=10.0 2024-08-11 08:26:26,906 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 08:26:34,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1006390.0, ans=0.125 2024-08-11 08:26:43,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2024-08-11 08:27:00,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13700, loss[loss=0.09787, beats_loss=0.01296, ecapa_loss=0.0001858, whisper_loss=0.08305, over 17110.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01147, ecapa_loss=0.0002072, whisper_loss=0.09408, over 3910570.57 frames. ], batch size: 68, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:27:05,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1006590.0, ans=0.125 2024-08-11 08:27:06,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1006590.0, ans=0.125 2024-08-11 08:27:17,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1006690.0, ans=0.2 2024-08-11 08:27:26,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1006690.0, ans=0.125 2024-08-11 08:27:33,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1006790.0, ans=0.125 2024-08-11 08:27:48,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1006890.0, ans=0.09899494936611666 2024-08-11 08:27:48,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1006890.0, ans=0.125 2024-08-11 08:28:02,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.699e+01 3.024e+01 3.641e+01 8.253e+01, threshold=6.049e+01, percent-clipped=1.0 2024-08-11 08:28:15,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13750, loss[loss=0.1185, beats_loss=0.0102, ecapa_loss=0.0002092, whisper_loss=0.1062, over 21321.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01144, ecapa_loss=0.0002079, whisper_loss=0.09346, over 3896265.78 frames. ], batch size: 84, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:28:21,790 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 08:28:40,137 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 08:28:50,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1007290.0, ans=0.125 2024-08-11 08:29:10,138 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 08:29:19,670 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 08:29:27,359 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 08:29:30,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13800, loss[loss=0.1052, beats_loss=0.01152, ecapa_loss=0.0001873, whisper_loss=0.09185, over 23216.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01144, ecapa_loss=0.0002072, whisper_loss=0.09295, over 3857347.59 frames. ], batch size: 90, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:29:33,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1007590.0, ans=0.1 2024-08-11 08:29:53,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1007690.0, ans=0.1 2024-08-11 08:29:57,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1007690.0, ans=0.125 2024-08-11 08:29:57,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2024-08-11 08:30:00,218 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 08:30:14,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1007790.0, ans=0.1 2024-08-11 08:30:14,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1007790.0, ans=0.0 2024-08-11 08:30:25,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1007890.0, ans=0.0 2024-08-11 08:30:35,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.572e+01 2.803e+01 3.077e+01 5.296e+01, threshold=5.605e+01, percent-clipped=0.0 2024-08-11 08:30:49,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13850, loss[loss=0.1196, beats_loss=0.01225, ecapa_loss=0.0002051, whisper_loss=0.1053, over 23079.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01145, ecapa_loss=0.0002065, whisper_loss=0.093, over 3864866.52 frames. ], batch size: 94, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:30:57,447 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 08:30:57,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1008090.0, ans=0.0 2024-08-11 08:31:10,382 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 08:31:14,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1008190.0, ans=0.125 2024-08-11 08:31:28,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1008290.0, ans=0.2 2024-08-11 08:31:29,732 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 08:31:39,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1008390.0, ans=0.125 2024-08-11 08:31:47,447 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 08:31:49,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=12.0 2024-08-11 08:32:10,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13900, loss[loss=0.1249, beats_loss=0.009151, ecapa_loss=0.0002016, whisper_loss=0.1138, over 23336.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01145, ecapa_loss=0.0002056, whisper_loss=0.09382, over 3893988.96 frames. ], batch size: 92, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:32:17,366 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 08:32:38,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1008690.0, ans=0.125 2024-08-11 08:32:43,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1008790.0, ans=0.125 2024-08-11 08:33:00,575 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 08:33:09,332 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 08:33:14,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.808e+01 3.104e+01 3.560e+01 5.037e+01, threshold=6.208e+01, percent-clipped=0.0 2024-08-11 08:33:17,492 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 08:33:27,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1009090.0, ans=0.125 2024-08-11 08:33:28,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 13950, loss[loss=0.0827, beats_loss=0.01295, ecapa_loss=0.0001966, whisper_loss=0.06778, over 15072.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01154, ecapa_loss=0.0002045, whisper_loss=0.09308, over 3883728.25 frames. ], batch size: 61, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:33:33,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1009090.0, ans=0.125 2024-08-11 08:33:41,504 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 08:33:48,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1009190.0, ans=0.1 2024-08-11 08:34:04,730 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 08:34:29,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1009390.0, ans=0.0 2024-08-11 08:34:37,832 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 08:34:39,629 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 08:34:41,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1009490.0, ans=0.0 2024-08-11 08:34:47,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2024-08-11 08:34:48,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14000, loss[loss=0.1115, beats_loss=0.01197, ecapa_loss=0.0001707, whisper_loss=0.09777, over 17755.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01154, ecapa_loss=0.0002026, whisper_loss=0.09277, over 3873584.78 frames. ], batch size: 69, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:34:51,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1009590.0, ans=0.125 2024-08-11 08:35:14,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1009690.0, ans=0.125 2024-08-11 08:35:17,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-08-11 08:35:21,547 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 08:35:27,276 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:35:42,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1009890.0, ans=0.0 2024-08-11 08:35:57,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.710e+01 3.006e+01 3.538e+01 6.784e+01, threshold=6.013e+01, percent-clipped=1.0 2024-08-11 08:36:01,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1009990.0, ans=0.1 2024-08-11 08:36:02,823 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 08:36:12,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14050, loss[loss=0.08766, beats_loss=0.0127, ecapa_loss=0.000179, whisper_loss=0.07317, over 23396.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01149, ecapa_loss=0.0002031, whisper_loss=0.09301, over 3869777.45 frames. ], batch size: 94, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:36:18,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=22.5 2024-08-11 08:36:43,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1010290.0, ans=0.2 2024-08-11 08:37:00,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1010390.0, ans=0.1 2024-08-11 08:37:02,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1010390.0, ans=0.125 2024-08-11 08:37:13,188 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 08:37:24,796 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 08:37:37,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.28 vs. limit=10.0 2024-08-11 08:37:37,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14100, loss[loss=0.1162, beats_loss=0.009788, ecapa_loss=0.0002584, whisper_loss=0.1039, over 21717.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01149, ecapa_loss=0.0002033, whisper_loss=0.09358, over 3853707.04 frames. ], batch size: 92, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:37:42,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2024-08-11 08:37:46,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2024-08-11 08:38:02,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1010690.0, ans=0.025 2024-08-11 08:38:14,800 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 08:38:47,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1010990.0, ans=0.125 2024-08-11 08:38:49,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.624e+01 2.945e+01 3.408e+01 4.744e+01, threshold=5.889e+01, percent-clipped=0.0 2024-08-11 08:38:52,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-11 08:39:04,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1011090.0, ans=0.0 2024-08-11 08:39:05,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14150, loss[loss=0.1108, beats_loss=0.01161, ecapa_loss=0.000162, whisper_loss=0.09754, over 16443.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0115, ecapa_loss=0.0002026, whisper_loss=0.09377, over 3859749.69 frames. ], batch size: 63, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:39:10,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-08-11 08:39:53,136 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 08:39:59,490 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 08:40:03,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1011390.0, ans=0.125 2024-08-11 08:40:24,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-11 08:40:31,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14200, loss[loss=0.09304, beats_loss=0.01119, ecapa_loss=0.000239, whisper_loss=0.07946, over 22342.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01143, ecapa_loss=0.0002019, whisper_loss=0.09468, over 3901604.57 frames. ], batch size: 93, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:40:45,551 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 08:40:48,969 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 34 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 08:40:51,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1011690.0, ans=0.0 2024-08-11 08:40:51,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2024-08-11 08:41:02,362 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 08:41:09,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=22.5 2024-08-11 08:41:14,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011790.0, ans=0.1 2024-08-11 08:41:31,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1011890.0, ans=0.0 2024-08-11 08:42:03,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2024-08-11 08:42:07,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-11 08:42:09,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1011990.0, ans=0.125 2024-08-11 08:42:12,384 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.734e+01 3.043e+01 3.584e+01 5.331e+01, threshold=6.086e+01, percent-clipped=0.0 2024-08-11 08:42:29,561 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14250, loss[loss=0.1179, beats_loss=0.00968, ecapa_loss=0.0001408, whisper_loss=0.1068, over 16630.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01147, ecapa_loss=0.0002004, whisper_loss=0.09353, over 3903556.33 frames. ], batch size: 58, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:43:16,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-11 08:43:20,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1012290.0, ans=0.125 2024-08-11 08:43:26,275 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.545e+02 2024-08-11 08:43:37,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1012390.0, ans=0.0 2024-08-11 08:43:45,864 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-11 08:43:46,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=23.23 vs. limit=15.0 2024-08-11 08:43:57,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14300, loss[loss=0.09898, beats_loss=0.01084, ecapa_loss=0.0002254, whisper_loss=0.08589, over 19107.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01149, ecapa_loss=0.000201, whisper_loss=0.09326, over 3929672.80 frames. ], batch size: 81, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:43:58,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1012590.0, ans=0.125 2024-08-11 08:44:05,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=12.0 2024-08-11 08:44:11,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-11 08:44:19,768 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 08:44:28,405 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.143e+00 2024-08-11 08:44:33,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1012790.0, ans=0.125 2024-08-11 08:44:33,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1012790.0, ans=0.1 2024-08-11 08:44:35,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1012790.0, ans=0.125 2024-08-11 08:44:52,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1012890.0, ans=0.125 2024-08-11 08:45:05,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.720e+01 3.044e+01 3.421e+01 5.497e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:45:11,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-11 08:45:17,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-11 08:45:19,349 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14350, loss[loss=0.09205, beats_loss=0.009931, ecapa_loss=0.0002367, whisper_loss=0.07976, over 16050.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01143, ecapa_loss=0.0002019, whisper_loss=0.0935, over 3925673.48 frames. ], batch size: 68, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:45:21,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1013090.0, ans=0.2 2024-08-11 08:45:54,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1013290.0, ans=0.1 2024-08-11 08:46:01,710 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 08:46:02,115 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.183e-02 2024-08-11 08:46:13,716 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 08:46:17,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1013390.0, ans=0.2 2024-08-11 08:46:18,747 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-11 08:46:23,993 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 08:46:26,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1013490.0, ans=0.1 2024-08-11 08:46:26,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-11 08:46:27,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1013490.0, ans=0.125 2024-08-11 08:46:41,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14400, loss[loss=0.132, beats_loss=0.009239, ecapa_loss=0.0002514, whisper_loss=0.1203, over 17350.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01146, ecapa_loss=0.0002014, whisper_loss=0.09355, over 3920736.34 frames. ], batch size: 71, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:46:44,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2024-08-11 08:47:46,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.704e+01 3.131e+01 3.618e+01 5.413e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 08:48:00,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 7, batch 14450, loss[loss=0.09674, beats_loss=0.01152, ecapa_loss=0.0002434, whisper_loss=0.08279, over 20917.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01144, ecapa_loss=0.0002032, whisper_loss=0.09405, over 3898604.78 frames. ], batch size: 88, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:48:17,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-11 08:48:17,758 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 08:48:22,622 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 08:48:42,196 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.308e-02 2024-08-11 08:49:46,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 0, loss[loss=0.08663, beats_loss=0.01378, ecapa_loss=0.0002068, whisper_loss=0.07078, over 22763.00 frames. ], tot_loss[loss=0.08663, beats_loss=0.01378, ecapa_loss=0.0002068, whisper_loss=0.07078, over 22763.00 frames. ], batch size: 95, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:49:46,116 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 08:50:29,116 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on ASR_libri: loss=0.2579, beats_loss=0, ecapa_loss=0.0006499, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 08:50:45,228 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on SV_voxceleb1: loss=0.005446, beats_loss=0, ecapa_loss=0.0005446, whisper_loss=0, over 939242.00 frames. 2024-08-11 08:52:49,638 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on AT_audioset: loss=0.02532, beats_loss=0.02532, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 08:52:49,644 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 08:52:58,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1014470.0, ans=0.125 2024-08-11 08:53:00,669 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 08:53:13,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1014470.0, ans=0.125 2024-08-11 08:54:10,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2024-08-11 08:54:31,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-11 08:54:32,396 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 08:54:44,669 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 08:55:05,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 50, loss[loss=0.1164, beats_loss=0.008363, ecapa_loss=0.000246, whisper_loss=0.1056, over 22719.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01102, ecapa_loss=0.0002092, whisper_loss=0.09065, over 898927.04 frames. ], batch size: 91, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:55:06,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1014970.0, ans=0.125 2024-08-11 08:55:12,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.926e+01 3.335e+01 3.829e+01 6.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-11 08:55:46,120 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 08:55:57,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1015170.0, ans=0.125 2024-08-11 08:56:24,366 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 08:56:26,988 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 08:56:30,849 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 08:56:39,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1015270.0, ans=0.0 2024-08-11 08:57:07,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 100, loss[loss=0.1079, beats_loss=0.01074, ecapa_loss=0.0002082, whisper_loss=0.09511, over 16153.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01105, ecapa_loss=0.0002051, whisper_loss=0.09023, over 1553099.39 frames. ], batch size: 63, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:57:16,439 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 08:57:24,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1015470.0, ans=0.0 2024-08-11 08:57:26,876 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-11 08:58:03,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1015670.0, ans=0.125 2024-08-11 08:58:22,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-11 08:58:58,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 150, loss[loss=0.08711, beats_loss=0.0136, ecapa_loss=0.0002079, whisper_loss=0.07143, over 14840.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0002018, whisper_loss=0.09148, over 2053964.23 frames. ], batch size: 61, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:59:04,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.999e+01 3.323e+01 3.859e+01 6.934e+01, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 08:59:06,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1015970.0, ans=0.125 2024-08-11 08:59:39,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1016170.0, ans=0.125 2024-08-11 08:59:52,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1016270.0, ans=0.125 2024-08-11 08:59:53,910 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 08:59:57,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1016270.0, ans=0.125 2024-08-11 09:00:18,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1016370.0, ans=0.125 2024-08-11 09:00:22,639 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 09:00:24,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 200, loss[loss=0.1071, beats_loss=0.0111, ecapa_loss=0.0001647, whisper_loss=0.0943, over 21395.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01097, ecapa_loss=0.0002011, whisper_loss=0.09257, over 2436688.59 frames. ], batch size: 82, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:00:24,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1016470.0, ans=0.0 2024-08-11 09:00:33,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1016470.0, ans=0.125 2024-08-11 09:00:51,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1016570.0, ans=0.2 2024-08-11 09:00:52,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1016570.0, ans=0.025 2024-08-11 09:00:55,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-11 09:01:02,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2024-08-11 09:01:10,105 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 09:01:15,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1016770.0, ans=0.125 2024-08-11 09:01:24,854 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 09:01:25,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1016770.0, ans=0.125 2024-08-11 09:01:29,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1016870.0, ans=0.125 2024-08-11 09:01:39,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1016870.0, ans=0.125 2024-08-11 09:01:44,107 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 250, loss[loss=0.1122, beats_loss=0.01122, ecapa_loss=0.0002006, whisper_loss=0.09894, over 22637.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01103, ecapa_loss=0.0002012, whisper_loss=0.09344, over 2786688.73 frames. ], batch size: 88, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:01:48,899 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.577e+01 2.891e+01 3.229e+01 6.128e+01, threshold=5.781e+01, percent-clipped=0.0 2024-08-11 09:01:52,029 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 09:02:00,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1017070.0, ans=0.125 2024-08-11 09:02:36,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1017270.0, ans=0.0 2024-08-11 09:02:44,141 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 09:02:48,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1017370.0, ans=0.125 2024-08-11 09:03:01,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 300, loss[loss=0.1067, beats_loss=0.01056, ecapa_loss=0.000228, whisper_loss=0.09388, over 21826.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01098, ecapa_loss=0.0002007, whisper_loss=0.09357, over 3008813.71 frames. ], batch size: 92, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:03:16,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.97 vs. limit=22.5 2024-08-11 09:03:38,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2024-08-11 09:03:45,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-11 09:03:48,823 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 09:04:00,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1017770.0, ans=0.0 2024-08-11 09:04:17,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 350, loss[loss=0.09088, beats_loss=0.0123, ecapa_loss=0.0002114, whisper_loss=0.07647, over 14769.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01104, ecapa_loss=0.0002007, whisper_loss=0.09271, over 3146749.81 frames. ], batch size: 57, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:04:18,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2024-08-11 09:04:22,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.490e+01 2.836e+01 3.239e+01 6.329e+01, threshold=5.671e+01, percent-clipped=2.0 2024-08-11 09:04:37,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1018070.0, ans=0.125 2024-08-11 09:04:37,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1018070.0, ans=0.0 2024-08-11 09:04:48,556 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 09:04:53,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1018170.0, ans=0.125 2024-08-11 09:05:03,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1018270.0, ans=15.0 2024-08-11 09:05:33,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 400, loss[loss=0.1009, beats_loss=0.01237, ecapa_loss=0.000197, whisper_loss=0.08656, over 21052.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01107, ecapa_loss=0.0001985, whisper_loss=0.09336, over 3314595.78 frames. ], batch size: 83, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:05:41,863 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 09:05:42,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1018470.0, ans=0.125 2024-08-11 09:05:47,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=12.0 2024-08-11 09:06:00,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1018570.0, ans=0.05 2024-08-11 09:06:07,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1018670.0, ans=0.125 2024-08-11 09:06:17,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1018670.0, ans=0.0 2024-08-11 09:06:20,745 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 09:06:22,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1018770.0, ans=0.0 2024-08-11 09:06:36,963 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 09:06:40,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1018870.0, ans=0.125 2024-08-11 09:06:50,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2024-08-11 09:06:51,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 450, loss[loss=0.1061, beats_loss=0.009552, ecapa_loss=0.0002159, whisper_loss=0.09438, over 22199.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01115, ecapa_loss=0.0001999, whisper_loss=0.0924, over 3429495.28 frames. ], batch size: 88, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:06:55,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.612e+01 2.893e+01 3.369e+01 4.521e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-11 09:07:01,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1018970.0, ans=0.0 2024-08-11 09:07:01,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1018970.0, ans=0.0 2024-08-11 09:07:10,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2024-08-11 09:07:19,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=12.0 2024-08-11 09:07:22,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1019170.0, ans=0.1 2024-08-11 09:07:25,937 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 09:07:49,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1019270.0, ans=0.125 2024-08-11 09:08:02,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1019370.0, ans=0.1 2024-08-11 09:08:08,518 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 09:08:09,833 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 500, loss[loss=0.1116, beats_loss=0.01011, ecapa_loss=0.000167, whisper_loss=0.09982, over 22166.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01121, ecapa_loss=0.0001987, whisper_loss=0.09197, over 3547341.98 frames. ], batch size: 83, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:08:14,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1019470.0, ans=0.125 2024-08-11 09:08:15,395 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 09:08:22,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1019470.0, ans=0.125 2024-08-11 09:08:29,829 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 09:08:33,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019570.0, ans=0.1 2024-08-11 09:08:49,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.30 vs. limit=10.0 2024-08-11 09:08:54,296 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 09:08:57,100 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 09:08:58,789 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 09:09:03,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019770.0, ans=0.1 2024-08-11 09:09:14,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1019870.0, ans=0.125 2024-08-11 09:09:15,640 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 09:09:21,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1019870.0, ans=0.125 2024-08-11 09:09:32,370 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 550, loss[loss=0.09757, beats_loss=0.01283, ecapa_loss=0.0001937, whisper_loss=0.0828, over 16215.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01129, ecapa_loss=0.0001966, whisper_loss=0.09149, over 3605174.90 frames. ], batch size: 65, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:09:32,524 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 09:09:37,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.649e+01 3.106e+01 3.487e+01 7.469e+01, threshold=6.212e+01, percent-clipped=4.0 2024-08-11 09:09:44,779 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 09:09:50,336 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 09:09:55,668 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 09:10:14,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1020170.0, ans=0.1 2024-08-11 09:10:18,643 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 09:10:41,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-11 09:10:47,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 600, loss[loss=0.1031, beats_loss=0.01031, ecapa_loss=0.0001865, whisper_loss=0.0909, over 20554.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01129, ecapa_loss=0.0001955, whisper_loss=0.09163, over 3664165.33 frames. ], batch size: 82, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:11:10,393 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 09:11:40,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1020770.0, ans=0.125 2024-08-11 09:11:51,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-11 09:12:02,005 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 09:12:03,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1020870.0, ans=0.125 2024-08-11 09:12:06,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 650, loss[loss=0.1215, beats_loss=0.01037, ecapa_loss=0.0001881, whisper_loss=0.1092, over 17348.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01118, ecapa_loss=0.0001955, whisper_loss=0.09256, over 3701508.42 frames. ], batch size: 65, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:12:07,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1020970.0, ans=0.2 2024-08-11 09:12:09,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1020970.0, ans=0.125 2024-08-11 09:12:10,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.651e+01 2.850e+01 3.204e+01 4.737e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 09:12:23,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1021070.0, ans=0.125 2024-08-11 09:12:59,669 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 23 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-11 09:13:16,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2024-08-11 09:13:21,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.01 vs. limit=15.0 2024-08-11 09:13:21,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 700, loss[loss=0.114, beats_loss=0.01339, ecapa_loss=0.000207, whisper_loss=0.09855, over 21405.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0001965, whisper_loss=0.093, over 3718320.57 frames. ], batch size: 87, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:14:03,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1021670.0, ans=0.0 2024-08-11 09:14:04,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-11 09:14:37,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 750, loss[loss=0.1075, beats_loss=0.01302, ecapa_loss=0.0001685, whisper_loss=0.0928, over 23785.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.0001948, whisper_loss=0.09258, over 3745169.52 frames. ], batch size: 92, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:14:40,748 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 09:14:42,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.660e+01 3.127e+01 3.627e+01 6.783e+01, threshold=6.254e+01, percent-clipped=6.0 2024-08-11 09:15:00,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-08-11 09:15:05,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1022070.0, ans=0.0 2024-08-11 09:15:24,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1022270.0, ans=0.125 2024-08-11 09:15:25,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1022270.0, ans=0.0 2024-08-11 09:15:28,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1022270.0, ans=0.0 2024-08-11 09:15:39,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1022370.0, ans=0.07 2024-08-11 09:15:43,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1022370.0, ans=0.0 2024-08-11 09:15:47,167 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 09:15:48,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1022370.0, ans=0.1 2024-08-11 09:15:54,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 800, loss[loss=0.1081, beats_loss=0.01304, ecapa_loss=0.0001329, whisper_loss=0.09372, over 21992.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01128, ecapa_loss=0.0001949, whisper_loss=0.09235, over 3780452.19 frames. ], batch size: 83, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:15:58,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1022470.0, ans=0.025 2024-08-11 09:15:58,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1022470.0, ans=0.125 2024-08-11 09:15:59,096 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 09:16:07,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-08-11 09:16:08,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1022570.0, ans=0.125 2024-08-11 09:16:19,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1022570.0, ans=0.125 2024-08-11 09:16:20,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-08-11 09:16:31,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1022670.0, ans=0.2 2024-08-11 09:16:43,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1022770.0, ans=0.125 2024-08-11 09:16:44,343 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 09:16:59,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1022870.0, ans=0.125 2024-08-11 09:17:07,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 850, loss[loss=0.09416, beats_loss=0.01171, ecapa_loss=0.0001909, whisper_loss=0.08055, over 17538.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01128, ecapa_loss=0.0001946, whisper_loss=0.09224, over 3791157.45 frames. ], batch size: 67, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:17:11,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.661e+01 2.916e+01 3.361e+01 8.910e+01, threshold=5.831e+01, percent-clipped=1.0 2024-08-11 09:17:18,650 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 09:17:19,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1022970.0, ans=0.125 2024-08-11 09:17:30,639 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 09:17:32,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1023070.0, ans=0.95 2024-08-11 09:17:46,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1023170.0, ans=0.125 2024-08-11 09:17:48,702 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 09:17:49,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1023170.0, ans=0.125 2024-08-11 09:18:09,839 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 09:18:12,608 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 09:18:19,063 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 09:18:21,903 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 900, loss[loss=0.09648, beats_loss=0.01182, ecapa_loss=0.0001879, whisper_loss=0.08278, over 22209.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01128, ecapa_loss=0.0001944, whisper_loss=0.09223, over 3811057.46 frames. ], batch size: 90, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:18:42,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1023570.0, ans=0.0 2024-08-11 09:18:53,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1023670.0, ans=15.0 2024-08-11 09:19:03,455 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 09:19:11,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1023770.0, ans=0.125 2024-08-11 09:19:18,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1023770.0, ans=0.125 2024-08-11 09:19:31,432 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 21 from LS+wenet, 30 from Vox, 45 fro AS 2024-08-11 09:19:31,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1023870.0, ans=0.2 2024-08-11 09:19:36,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 950, loss[loss=0.1132, beats_loss=0.009187, ecapa_loss=0.0001744, whisper_loss=0.1023, over 20082.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01123, ecapa_loss=0.0001936, whisper_loss=0.09191, over 3791024.33 frames. ], batch size: 74, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:19:37,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.10 vs. limit=10.0 2024-08-11 09:19:40,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.622e+01 2.876e+01 3.425e+01 6.209e+01, threshold=5.753e+01, percent-clipped=1.0 2024-08-11 09:19:54,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1024070.0, ans=0.0 2024-08-11 09:20:05,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1024170.0, ans=0.125 2024-08-11 09:20:35,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1024270.0, ans=0.125 2024-08-11 09:20:37,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1024270.0, ans=0.125 2024-08-11 09:20:38,389 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 09:20:39,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1024270.0, ans=0.1 2024-08-11 09:20:56,597 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 09:20:57,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1024370.0, ans=0.125 2024-08-11 09:20:58,626 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 09:21:00,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1000, loss[loss=0.09751, beats_loss=0.01338, ecapa_loss=0.000183, whisper_loss=0.0823, over 21518.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01142, ecapa_loss=0.0001918, whisper_loss=0.09134, over 3823210.93 frames. ], batch size: 90, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:21:02,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1024470.0, ans=0.1 2024-08-11 09:21:32,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1024570.0, ans=0.04949747468305833 2024-08-11 09:21:48,928 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-11 09:22:10,178 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 09:22:10,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1024770.0, ans=0.0 2024-08-11 09:22:11,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1024870.0, ans=0.0 2024-08-11 09:22:16,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1024870.0, ans=0.125 2024-08-11 09:22:28,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1024870.0, ans=0.025 2024-08-11 09:22:32,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1050, loss[loss=0.1055, beats_loss=0.009407, ecapa_loss=0.0002015, whisper_loss=0.09406, over 15068.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0114, ecapa_loss=0.000192, whisper_loss=0.09093, over 3813049.60 frames. ], batch size: 60, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:22:39,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.754e+01 3.061e+01 3.548e+01 9.955e+01, threshold=6.122e+01, percent-clipped=1.0 2024-08-11 09:23:01,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1025070.0, ans=0.125 2024-08-11 09:23:08,938 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:23:11,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2024-08-11 09:23:46,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1025270.0, ans=0.125 2024-08-11 09:24:09,635 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 09:24:14,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1025370.0, ans=0.125 2024-08-11 09:24:15,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1025370.0, ans=0.125 2024-08-11 09:24:21,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1100, loss[loss=0.09946, beats_loss=0.01101, ecapa_loss=0.0001688, whisper_loss=0.08676, over 18045.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01139, ecapa_loss=0.0001916, whisper_loss=0.09102, over 3825746.23 frames. ], batch size: 68, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:24:24,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1025470.0, ans=0.125 2024-08-11 09:24:31,806 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.935e+05 2024-08-11 09:24:39,369 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 09:24:44,644 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 09:25:01,966 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 09:25:08,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.36 vs. limit=22.5 2024-08-11 09:25:47,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1025870.0, ans=0.05 2024-08-11 09:26:01,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1025870.0, ans=0.125 2024-08-11 09:26:08,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1150, loss[loss=0.1173, beats_loss=0.01189, ecapa_loss=0.0001824, whisper_loss=0.1035, over 19766.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01127, ecapa_loss=0.0001935, whisper_loss=0.09158, over 3851680.15 frames. ], batch size: 80, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:26:14,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.696e+01 3.045e+01 3.408e+01 7.482e+01, threshold=6.090e+01, percent-clipped=2.0 2024-08-11 09:26:15,955 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.440e-01 2024-08-11 09:26:25,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1025970.0, ans=0.125 2024-08-11 09:26:28,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1026070.0, ans=0.0 2024-08-11 09:26:37,482 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 09:26:57,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1026170.0, ans=0.0 2024-08-11 09:27:12,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1026270.0, ans=0.125 2024-08-11 09:27:43,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1026370.0, ans=0.125 2024-08-11 09:27:47,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-08-11 09:27:54,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1200, loss[loss=0.1157, beats_loss=0.01062, ecapa_loss=0.0001804, whisper_loss=0.1033, over 18749.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01121, ecapa_loss=0.0001931, whisper_loss=0.09244, over 3841111.22 frames. ], batch size: 71, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:28:21,897 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 09:28:22,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1026570.0, ans=0.1 2024-08-11 09:28:39,405 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 09:28:46,469 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 09:28:48,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1026770.0, ans=0.0 2024-08-11 09:28:59,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1026870.0, ans=0.125 2024-08-11 09:29:04,974 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 12 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 09:29:12,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1250, loss[loss=0.09046, beats_loss=0.01142, ecapa_loss=0.0001884, whisper_loss=0.07715, over 19733.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01136, ecapa_loss=0.0001924, whisper_loss=0.09178, over 3818786.73 frames. ], batch size: 81, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:29:17,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.549e+01 2.780e+01 3.273e+01 6.263e+01, threshold=5.560e+01, percent-clipped=1.0 2024-08-11 09:29:21,575 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 09:29:22,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1026970.0, ans=0.125 2024-08-11 09:29:23,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1026970.0, ans=0.125 2024-08-11 09:29:37,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1027070.0, ans=0.1 2024-08-11 09:29:50,026 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 09:29:50,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1027170.0, ans=0.0 2024-08-11 09:30:04,432 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 09:30:20,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1027370.0, ans=0.125 2024-08-11 09:30:27,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1300, loss[loss=0.1082, beats_loss=0.01324, ecapa_loss=0.000168, whisper_loss=0.09328, over 20136.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0113, ecapa_loss=0.0001933, whisper_loss=0.09211, over 3819999.44 frames. ], batch size: 80, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:30:27,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1027470.0, ans=0.0 2024-08-11 09:30:44,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1027570.0, ans=0.125 2024-08-11 09:30:53,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-08-11 09:30:57,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1027570.0, ans=0.0 2024-08-11 09:31:03,198 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 8 from Vox, 38 fro AS 2024-08-11 09:31:10,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1027670.0, ans=0.0 2024-08-11 09:31:17,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1027770.0, ans=10.0 2024-08-11 09:31:18,197 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 09:31:44,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1350, loss[loss=0.1126, beats_loss=0.01135, ecapa_loss=0.0001751, whisper_loss=0.09954, over 23930.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01142, ecapa_loss=0.0001923, whisper_loss=0.09177, over 3824370.37 frames. ], batch size: 94, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:31:44,941 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 09:31:49,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.558e+01 2.922e+01 3.559e+01 4.960e+01, threshold=5.843e+01, percent-clipped=0.0 2024-08-11 09:31:57,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1027970.0, ans=0.05 2024-08-11 09:32:00,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1028070.0, ans=0.0 2024-08-11 09:32:03,424 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 09:32:09,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-11 09:32:14,459 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 09:32:34,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=12.0 2024-08-11 09:32:35,284 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 09:32:54,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1028370.0, ans=0.125 2024-08-11 09:32:59,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1400, loss[loss=0.1027, beats_loss=0.01249, ecapa_loss=0.0001854, whisper_loss=0.08836, over 22038.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01145, ecapa_loss=0.0001912, whisper_loss=0.09196, over 3843560.21 frames. ], batch size: 90, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:33:06,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1028470.0, ans=15.0 2024-08-11 09:33:12,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2024-08-11 09:33:32,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=12.0 2024-08-11 09:33:36,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1028670.0, ans=0.2 2024-08-11 09:33:38,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2024-08-11 09:33:44,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1028770.0, ans=0.95 2024-08-11 09:33:50,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1028770.0, ans=0.5 2024-08-11 09:34:07,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1028870.0, ans=0.0 2024-08-11 09:34:08,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1028870.0, ans=0.125 2024-08-11 09:34:28,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1450, loss[loss=0.1047, beats_loss=0.01279, ecapa_loss=0.0001687, whisper_loss=0.09025, over 21253.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01144, ecapa_loss=0.0001899, whisper_loss=0.09234, over 3829710.11 frames. ], batch size: 83, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:34:33,024 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.516e+01 2.871e+01 3.149e+01 4.386e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 09:34:48,125 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 09:34:50,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1029070.0, ans=0.0 2024-08-11 09:34:51,040 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 09:34:57,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1029170.0, ans=0.0 2024-08-11 09:35:20,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1029270.0, ans=0.07 2024-08-11 09:35:34,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-11 09:35:35,607 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 09:35:42,245 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-11 09:35:44,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1029370.0, ans=10.0 2024-08-11 09:35:48,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1500, loss[loss=0.1164, beats_loss=0.01046, ecapa_loss=0.0001913, whisper_loss=0.104, over 16569.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01135, ecapa_loss=0.0001892, whisper_loss=0.0928, over 3833684.81 frames. ], batch size: 62, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:35:49,590 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 09:35:57,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-08-11 09:36:11,211 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 09:36:16,113 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 09:36:50,584 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 09:37:07,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1550, loss[loss=0.09406, beats_loss=0.01314, ecapa_loss=0.000197, whisper_loss=0.07895, over 19422.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001898, whisper_loss=0.09277, over 3847432.68 frames. ], batch size: 81, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:37:07,625 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 09:37:11,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.727e+01 2.976e+01 3.507e+01 6.642e+01, threshold=5.952e+01, percent-clipped=2.0 2024-08-11 09:37:25,418 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 09:37:33,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1030070.0, ans=0.125 2024-08-11 09:37:46,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1030170.0, ans=15.0 2024-08-11 09:37:48,997 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 12 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 09:37:51,889 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 09:38:10,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1030370.0, ans=0.125 2024-08-11 09:38:25,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2024-08-11 09:38:26,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1600, loss[loss=0.1123, beats_loss=0.009561, ecapa_loss=0.0001824, whisper_loss=0.1009, over 17756.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01136, ecapa_loss=0.0001899, whisper_loss=0.09221, over 3859162.79 frames. ], batch size: 68, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:38:46,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2024-08-11 09:38:54,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1030570.0, ans=0.2 2024-08-11 09:39:12,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1030770.0, ans=0.125 2024-08-11 09:39:12,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1030770.0, ans=0.125 2024-08-11 09:39:12,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.98 vs. limit=10.0 2024-08-11 09:39:13,179 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 09:39:18,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1030770.0, ans=0.0 2024-08-11 09:39:33,296 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-11 09:39:37,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1030870.0, ans=0.125 2024-08-11 09:39:42,085 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 09:39:43,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1650, loss[loss=0.09795, beats_loss=0.01308, ecapa_loss=0.0001684, whisper_loss=0.08319, over 20982.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01128, ecapa_loss=0.0001908, whisper_loss=0.09237, over 3828033.82 frames. ], batch size: 83, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:39:45,548 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 09:39:48,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-08-11 09:39:48,496 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.611e+01 2.904e+01 3.448e+01 5.228e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-11 09:39:56,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1030970.0, ans=0.2 2024-08-11 09:40:06,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1031070.0, ans=0.125 2024-08-11 09:40:14,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2024-08-11 09:40:25,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1031170.0, ans=0.1 2024-08-11 09:40:29,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1031270.0, ans=0.05 2024-08-11 09:40:39,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1031270.0, ans=0.2 2024-08-11 09:40:41,649 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 09:40:54,654 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 09:40:57,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1700, loss[loss=0.1144, beats_loss=0.008221, ecapa_loss=0.0001741, whisper_loss=0.1045, over 17663.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01121, ecapa_loss=0.000191, whisper_loss=0.09277, over 3809740.10 frames. ], batch size: 63, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:41:00,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1031470.0, ans=0.0 2024-08-11 09:41:33,442 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 09:41:47,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-11 09:41:55,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1031870.0, ans=0.0 2024-08-11 09:41:59,838 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-11 09:42:09,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1750, loss[loss=0.08871, beats_loss=0.009503, ecapa_loss=0.0002119, whisper_loss=0.07708, over 18505.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001901, whisper_loss=0.09306, over 3821427.09 frames. ], batch size: 76, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:42:09,369 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 09:42:11,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=22.5 2024-08-11 09:42:13,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.694e+01 3.096e+01 3.648e+01 5.495e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 09:42:54,031 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 09:43:17,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1032370.0, ans=0.0 2024-08-11 09:43:21,905 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1800, loss[loss=0.1014, beats_loss=0.01272, ecapa_loss=0.0001892, whisper_loss=0.08682, over 16630.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01121, ecapa_loss=0.0001906, whisper_loss=0.09283, over 3825963.78 frames. ], batch size: 66, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:43:25,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1032470.0, ans=0.125 2024-08-11 09:43:33,698 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 09:44:08,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1032770.0, ans=0.125 2024-08-11 09:44:12,829 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 09:44:16,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1032770.0, ans=0.5 2024-08-11 09:44:21,295 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 09:44:35,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1850, loss[loss=0.1041, beats_loss=0.01202, ecapa_loss=0.0001815, whisper_loss=0.09028, over 17348.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01125, ecapa_loss=0.00019, whisper_loss=0.09332, over 3857495.25 frames. ], batch size: 68, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:44:39,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.564e+01 2.931e+01 3.381e+01 4.621e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-11 09:44:58,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1033070.0, ans=0.0 2024-08-11 09:45:01,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=1033070.0, ans=0.02 2024-08-11 09:45:06,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-11 09:45:06,969 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 09:45:25,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-11 09:45:46,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1033470.0, ans=0.0 2024-08-11 09:45:47,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1900, loss[loss=0.08966, beats_loss=0.01238, ecapa_loss=0.0002454, whisper_loss=0.07483, over 16147.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01132, ecapa_loss=0.0001917, whisper_loss=0.09287, over 3852795.43 frames. ], batch size: 70, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:45:54,517 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 09:46:01,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-11 09:46:09,395 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 09:46:20,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1033670.0, ans=0.1 2024-08-11 09:46:45,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-08-11 09:46:50,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1033870.0, ans=0.1 2024-08-11 09:47:00,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 1950, loss[loss=0.1207, beats_loss=0.008573, ecapa_loss=0.0002444, whisper_loss=0.1097, over 14233.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.0001948, whisper_loss=0.09309, over 3838014.11 frames. ], batch size: 55, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:47:00,810 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 09:47:03,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1033970.0, ans=0.125 2024-08-11 09:47:05,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.682e+01 2.998e+01 3.589e+01 5.098e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 09:47:05,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1033970.0, ans=0.2 2024-08-11 09:47:15,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1034070.0, ans=0.125 2024-08-11 09:47:24,930 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-11 09:47:29,081 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 09:47:35,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1034170.0, ans=0.1 2024-08-11 09:47:37,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1034170.0, ans=0.125 2024-08-11 09:47:38,309 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 09:47:43,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-11 09:47:56,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1034270.0, ans=0.1 2024-08-11 09:48:13,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2000, loss[loss=0.1127, beats_loss=0.01117, ecapa_loss=0.0002125, whisper_loss=0.0994, over 22455.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01133, ecapa_loss=0.0001953, whisper_loss=0.09288, over 3857717.03 frames. ], batch size: 91, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:48:17,759 WARNING [optim.py:496] (3/4) Scaling gradients by 0.059571195393800735, model_norm_threshold=59.96577072143555 2024-08-11 09:48:17,985 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.97, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.877e+05, grad_sumsq=1.108e+05, orig_rms_sq=8.917e+00 2024-08-11 09:48:25,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1034470.0, ans=0.2 2024-08-11 09:48:25,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-11 09:48:34,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1034570.0, ans=0.0 2024-08-11 09:48:40,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1034570.0, ans=0.0 2024-08-11 09:48:51,701 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.315e+02 2024-08-11 09:48:52,633 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 09:49:08,486 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 09:49:15,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1034870.0, ans=0.1 2024-08-11 09:49:21,897 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 38 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-11 09:49:23,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1034870.0, ans=10.0 2024-08-11 09:49:27,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2050, loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0002268, whisper_loss=0.08781, over 20289.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01132, ecapa_loss=0.0001962, whisper_loss=0.09251, over 3879245.39 frames. ], batch size: 84, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:49:31,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.681e+01 2.944e+01 3.350e+01 1.007e+03, threshold=5.888e+01, percent-clipped=2.0 2024-08-11 09:49:32,620 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:49:34,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2024-08-11 09:49:35,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1034970.0, ans=0.125 2024-08-11 09:50:01,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1035170.0, ans=0.2 2024-08-11 09:50:22,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1035270.0, ans=0.125 2024-08-11 09:50:40,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2100, loss[loss=0.1122, beats_loss=0.01001, ecapa_loss=0.000255, whisper_loss=0.09963, over 18166.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01143, ecapa_loss=0.000195, whisper_loss=0.09208, over 3865549.21 frames. ], batch size: 77, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:51:00,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035570.0, ans=0.1 2024-08-11 09:51:32,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1035770.0, ans=0.125 2024-08-11 09:51:33,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1035770.0, ans=0.125 2024-08-11 09:51:46,595 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.866e-01 2024-08-11 09:51:54,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2150, loss[loss=0.06179, beats_loss=0.01524, ecapa_loss=0.0001763, whisper_loss=0.04479, over 12679.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0115, ecapa_loss=0.0001944, whisper_loss=0.09191, over 3841749.08 frames. ], batch size: 53, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:51:57,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1035970.0, ans=0.125 2024-08-11 09:51:57,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1035970.0, ans=0.0 2024-08-11 09:51:58,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.546e+01 2.848e+01 3.381e+01 6.507e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 09:51:59,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2024-08-11 09:52:05,756 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 09:52:26,106 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 09:52:29,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1036170.0, ans=0.0 2024-08-11 09:52:30,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1036170.0, ans=0.1 2024-08-11 09:52:37,861 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 09:52:38,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1036270.0, ans=0.125 2024-08-11 09:53:04,063 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 09:53:06,848 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2200, loss[loss=0.09573, beats_loss=0.0117, ecapa_loss=0.0001715, whisper_loss=0.08231, over 20738.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01142, ecapa_loss=0.0001941, whisper_loss=0.09307, over 3861442.43 frames. ], batch size: 81, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:53:10,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1036470.0, ans=0.125 2024-08-11 09:53:16,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2024-08-11 09:53:17,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-08-11 09:54:15,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2250, loss[loss=0.09259, beats_loss=0.01416, ecapa_loss=0.0001652, whisper_loss=0.07679, over 22826.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01147, ecapa_loss=0.0001951, whisper_loss=0.09287, over 3856072.26 frames. ], batch size: 91, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:54:15,629 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 09:54:17,001 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 09:54:19,418 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.681e+01 2.914e+01 3.367e+01 5.391e+01, threshold=5.828e+01, percent-clipped=0.0 2024-08-11 09:54:25,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1036970.0, ans=0.125 2024-08-11 09:54:28,759 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.415e+00 2024-08-11 09:54:58,227 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 09:54:58,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1037270.0, ans=0.125 2024-08-11 09:55:03,430 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 09:55:09,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1037370.0, ans=0.1 2024-08-11 09:55:20,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1037470.0, ans=0.0 2024-08-11 09:55:21,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2300, loss[loss=0.09739, beats_loss=0.01156, ecapa_loss=0.0002263, whisper_loss=0.08356, over 14336.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0114, ecapa_loss=0.0001968, whisper_loss=0.0938, over 3889822.57 frames. ], batch size: 57, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:55:29,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037470.0, ans=0.1 2024-08-11 09:55:40,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1037570.0, ans=0.125 2024-08-11 09:55:42,914 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-11 09:55:47,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037670.0, ans=0.1 2024-08-11 09:55:54,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1037670.0, ans=0.125 2024-08-11 09:56:04,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1037770.0, ans=0.125 2024-08-11 09:56:16,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-11 09:56:27,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2350, loss[loss=0.1091, beats_loss=0.01199, ecapa_loss=0.0001939, whisper_loss=0.09522, over 22540.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01138, ecapa_loss=0.000197, whisper_loss=0.09416, over 3880746.98 frames. ], batch size: 92, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:56:31,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.661e+01 3.016e+01 3.402e+01 1.211e+02, threshold=6.032e+01, percent-clipped=3.0 2024-08-11 09:56:39,644 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 09:56:47,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1038070.0, ans=0.0 2024-08-11 09:56:58,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1038170.0, ans=0.0 2024-08-11 09:57:18,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.63 vs. limit=10.0 2024-08-11 09:57:21,461 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 31 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-11 09:57:27,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1038370.0, ans=0.125 2024-08-11 09:57:31,943 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 09:57:32,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-08-11 09:57:33,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2400, loss[loss=0.109, beats_loss=0.01102, ecapa_loss=0.0001895, whisper_loss=0.09608, over 18134.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01138, ecapa_loss=0.0001974, whisper_loss=0.09373, over 3887237.88 frames. ], batch size: 72, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:57:33,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1038470.0, ans=0.125 2024-08-11 09:57:34,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2024-08-11 09:57:39,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1038470.0, ans=0.1 2024-08-11 09:57:47,440 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 09:57:48,740 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 09:57:56,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=12.0 2024-08-11 09:58:09,926 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 09:58:23,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1038770.0, ans=0.04949747468305833 2024-08-11 09:58:24,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=15.0 2024-08-11 09:58:31,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1038870.0, ans=0.125 2024-08-11 09:58:39,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2450, loss[loss=0.1051, beats_loss=0.01032, ecapa_loss=0.0001982, whisper_loss=0.09275, over 23282.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01139, ecapa_loss=0.0001968, whisper_loss=0.09323, over 3842865.55 frames. ], batch size: 92, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:58:43,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.701e+01 2.979e+01 3.423e+01 5.204e+01, threshold=5.958e+01, percent-clipped=0.0 2024-08-11 09:58:52,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1039070.0, ans=0.1 2024-08-11 09:58:53,669 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 09:58:55,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1039070.0, ans=0.125 2024-08-11 09:59:03,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1039070.0, ans=0.0 2024-08-11 09:59:04,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1039170.0, ans=0.1 2024-08-11 09:59:06,828 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 09:59:08,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1039170.0, ans=0.125 2024-08-11 09:59:12,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1039170.0, ans=0.09899494936611666 2024-08-11 09:59:30,167 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 09:59:40,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1039370.0, ans=0.0 2024-08-11 09:59:44,196 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2500, loss[loss=0.09263, beats_loss=0.01195, ecapa_loss=0.0002514, whisper_loss=0.07817, over 19564.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01141, ecapa_loss=0.0001969, whisper_loss=0.09288, over 3866047.24 frames. ], batch size: 83, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:59:53,521 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 09:59:58,612 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 10:00:05,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=22.5 2024-08-11 10:00:11,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1039670.0, ans=0.95 2024-08-11 10:00:12,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1039670.0, ans=0.1 2024-08-11 10:00:14,527 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 10:00:24,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1039770.0, ans=0.125 2024-08-11 10:00:24,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-11 10:00:49,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2550, loss[loss=0.1081, beats_loss=0.01214, ecapa_loss=0.0001549, whisper_loss=0.09441, over 17404.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01142, ecapa_loss=0.0001968, whisper_loss=0.09258, over 3890143.96 frames. ], batch size: 67, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 10:00:50,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1039970.0, ans=0.2 2024-08-11 10:00:57,232 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.767e+01 3.292e+01 3.693e+01 5.376e+01, threshold=6.584e+01, percent-clipped=0.0 2024-08-11 10:00:57,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1039970.0, ans=0.1 2024-08-11 10:01:48,719 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 10:01:50,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-11 10:01:59,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2600, loss[loss=0.1249, beats_loss=0.009506, ecapa_loss=0.0001947, whisper_loss=0.1135, over 20893.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01134, ecapa_loss=0.0001976, whisper_loss=0.09282, over 3881147.09 frames. ], batch size: 81, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:02:08,455 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 10:02:14,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1040570.0, ans=0.125 2024-08-11 10:02:17,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1040570.0, ans=0.125 2024-08-11 10:02:22,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1040570.0, ans=0.125 2024-08-11 10:02:33,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1040670.0, ans=0.0 2024-08-11 10:02:34,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1040670.0, ans=0.2 2024-08-11 10:02:43,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-08-11 10:02:55,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1040870.0, ans=0.125 2024-08-11 10:03:05,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2650, loss[loss=0.1146, beats_loss=0.01077, ecapa_loss=0.0002026, whisper_loss=0.1018, over 22997.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01141, ecapa_loss=0.0001971, whisper_loss=0.09243, over 3880807.05 frames. ], batch size: 92, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:03:09,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.707e+01 2.925e+01 3.318e+01 6.568e+01, threshold=5.849e+01, percent-clipped=0.0 2024-08-11 10:03:11,034 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 10:03:17,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-11 10:03:27,781 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 10:03:41,238 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 10:03:58,882 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 10:04:06,290 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 10:04:11,334 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2700, loss[loss=0.1092, beats_loss=0.01129, ecapa_loss=0.0001978, whisper_loss=0.0959, over 22377.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0001961, whisper_loss=0.0934, over 3878623.31 frames. ], batch size: 89, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:04:23,537 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 10:04:32,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1041570.0, ans=0.125 2024-08-11 10:04:35,380 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 10:04:42,444 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 10:04:43,687 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 10:04:55,705 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 10:05:11,539 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 10:05:16,838 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 10:05:18,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2750, loss[loss=0.1106, beats_loss=0.01284, ecapa_loss=0.0001803, whisper_loss=0.09596, over 22688.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01136, ecapa_loss=0.0001961, whisper_loss=0.09341, over 3870742.15 frames. ], batch size: 90, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:05:19,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1041970.0, ans=0.125 2024-08-11 10:05:21,196 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 10:05:22,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.665e+01 2.980e+01 3.281e+01 5.234e+01, threshold=5.959e+01, percent-clipped=0.0 2024-08-11 10:05:36,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1042070.0, ans=0.0 2024-08-11 10:05:40,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1042070.0, ans=0.0 2024-08-11 10:05:40,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1042070.0, ans=0.125 2024-08-11 10:05:41,837 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 10:06:02,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1042270.0, ans=0.1 2024-08-11 10:06:08,525 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 10:06:24,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2800, loss[loss=0.09794, beats_loss=0.01232, ecapa_loss=0.0002562, whisper_loss=0.08306, over 18295.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01124, ecapa_loss=0.0001974, whisper_loss=0.09423, over 3865145.56 frames. ], batch size: 78, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:06:26,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1042470.0, ans=0.2 2024-08-11 10:06:28,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1042470.0, ans=0.0 2024-08-11 10:06:32,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1042470.0, ans=0.125 2024-08-11 10:06:36,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1042570.0, ans=0.125 2024-08-11 10:06:41,057 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 10:06:42,468 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-11 10:06:44,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1042570.0, ans=0.125 2024-08-11 10:06:53,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1042670.0, ans=0.0 2024-08-11 10:07:00,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1042670.0, ans=0.1 2024-08-11 10:07:29,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2850, loss[loss=0.09621, beats_loss=0.01218, ecapa_loss=0.0001926, whisper_loss=0.08211, over 15207.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01129, ecapa_loss=0.0001985, whisper_loss=0.09441, over 3862696.50 frames. ], batch size: 60, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:07:32,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1042970.0, ans=0.1 2024-08-11 10:07:33,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.749e+01 2.990e+01 3.438e+01 5.063e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 10:07:43,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1043070.0, ans=0.125 2024-08-11 10:07:51,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1043070.0, ans=15.0 2024-08-11 10:07:57,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1043170.0, ans=0.07 2024-08-11 10:08:01,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1043170.0, ans=0.125 2024-08-11 10:08:18,875 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.009e-01 2024-08-11 10:08:18,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1043270.0, ans=0.0 2024-08-11 10:08:35,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2900, loss[loss=0.09808, beats_loss=0.009503, ecapa_loss=0.0002363, whisper_loss=0.08622, over 16212.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01131, ecapa_loss=0.0002003, whisper_loss=0.09413, over 3869256.92 frames. ], batch size: 66, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:08:43,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-11 10:09:20,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1043770.0, ans=0.0 2024-08-11 10:09:24,022 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.908e-01 2024-08-11 10:09:24,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1043770.0, ans=0.125 2024-08-11 10:09:26,007 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 10:09:42,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 2950, loss[loss=0.1126, beats_loss=0.009826, ecapa_loss=0.0001944, whisper_loss=0.1009, over 23239.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01126, ecapa_loss=0.0002003, whisper_loss=0.09435, over 3907217.44 frames. ], batch size: 91, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:09:45,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.608e+01 2.908e+01 3.326e+01 5.190e+01, threshold=5.815e+01, percent-clipped=0.0 2024-08-11 10:09:46,084 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 10:09:46,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.02 vs. limit=15.0 2024-08-11 10:09:55,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1044070.0, ans=0.125 2024-08-11 10:09:59,404 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 10:10:02,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1044070.0, ans=0.0 2024-08-11 10:10:03,383 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 10:10:15,128 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 10:10:25,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-08-11 10:10:36,106 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 21 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-11 10:10:37,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-08-11 10:10:46,728 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-11 10:10:47,947 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3000, loss[loss=0.1337, beats_loss=0.009518, ecapa_loss=0.000221, whisper_loss=0.122, over 14084.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0112, ecapa_loss=0.0002008, whisper_loss=0.09494, over 3938759.73 frames. ], batch size: 55, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:10:47,948 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 10:11:27,189 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006456, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 10:11:45,338 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on SV_voxceleb1: loss=0.005368, beats_loss=0, ecapa_loss=0.0005368, whisper_loss=0, over 939242.00 frames. 2024-08-11 10:13:42,470 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on AT_audioset: loss=0.02512, beats_loss=0.02512, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 10:13:42,474 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 10:13:54,767 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-11 10:14:03,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-11 10:14:17,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1044670.0, ans=0.0 2024-08-11 10:14:20,065 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 10:14:20,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1044670.0, ans=0.02 2024-08-11 10:14:49,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3050, loss[loss=0.09521, beats_loss=0.01251, ecapa_loss=0.0002213, whisper_loss=0.08048, over 19081.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0113, ecapa_loss=0.0002009, whisper_loss=0.09397, over 3924435.97 frames. ], batch size: 78, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:14:50,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1044970.0, ans=0.1 2024-08-11 10:14:53,655 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.751e+01 3.093e+01 3.441e+01 4.563e+01, threshold=6.185e+01, percent-clipped=0.0 2024-08-11 10:15:06,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1045070.0, ans=0.0 2024-08-11 10:15:10,141 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 10:15:15,018 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 10:15:15,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2024-08-11 10:15:17,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-11 10:15:39,193 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 11 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 10:15:46,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.96 vs. limit=10.0 2024-08-11 10:15:48,772 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 10:15:55,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1045470.0, ans=0.125 2024-08-11 10:15:56,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3100, loss[loss=0.1087, beats_loss=0.008762, ecapa_loss=0.0002269, whisper_loss=0.09765, over 18103.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01135, ecapa_loss=0.0002021, whisper_loss=0.09405, over 3942876.82 frames. ], batch size: 74, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:16:16,405 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-11 10:16:19,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1045570.0, ans=0.2 2024-08-11 10:16:33,849 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 10:16:37,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-11 10:16:44,285 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 10:16:49,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.82 vs. limit=22.5 2024-08-11 10:16:54,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1045870.0, ans=0.125 2024-08-11 10:16:54,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1045870.0, ans=0.05 2024-08-11 10:16:57,990 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 10:17:00,461 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0893079861998558, model_norm_threshold=61.852699279785156 2024-08-11 10:17:00,642 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.682e+05, grad_sumsq=5.221e+04, orig_rms_sq=8.968e+00 2024-08-11 10:17:03,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3150, loss[loss=0.1188, beats_loss=0.01189, ecapa_loss=0.0002089, whisper_loss=0.1048, over 21676.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01144, ecapa_loss=0.0002018, whisper_loss=0.09383, over 3941497.41 frames. ], batch size: 88, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:17:03,837 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.072e-03 2024-08-11 10:17:07,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.851e+01 3.278e+01 3.632e+01 6.926e+02, threshold=6.555e+01, percent-clipped=1.0 2024-08-11 10:17:42,184 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 10:17:58,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1046370.0, ans=10.0 2024-08-11 10:18:00,678 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 10:18:09,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3200, loss[loss=0.09817, beats_loss=0.01002, ecapa_loss=0.0002809, whisper_loss=0.08533, over 14609.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01149, ecapa_loss=0.0002015, whisper_loss=0.0932, over 3899255.51 frames. ], batch size: 59, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:18:13,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1046470.0, ans=0.2 2024-08-11 10:18:15,474 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 10:18:19,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1046470.0, ans=0.125 2024-08-11 10:18:20,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1046470.0, ans=0.0 2024-08-11 10:18:26,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046570.0, ans=0.1 2024-08-11 10:18:46,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1046670.0, ans=0.1 2024-08-11 10:18:51,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2024-08-11 10:18:56,448 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 10:19:08,173 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 10:19:16,386 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3250, loss[loss=0.1112, beats_loss=0.01132, ecapa_loss=0.0002109, whisper_loss=0.09772, over 16010.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01146, ecapa_loss=0.0002007, whisper_loss=0.09311, over 3894880.60 frames. ], batch size: 65, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:19:20,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.734e+01 3.207e+01 3.832e+01 6.451e+01, threshold=6.414e+01, percent-clipped=0.0 2024-08-11 10:19:23,150 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 10:19:38,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1047070.0, ans=0.125 2024-08-11 10:19:57,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1047270.0, ans=0.125 2024-08-11 10:20:08,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2024-08-11 10:20:18,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1047370.0, ans=0.0 2024-08-11 10:20:19,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1047370.0, ans=0.1 2024-08-11 10:20:22,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3300, loss[loss=0.08443, beats_loss=0.01156, ecapa_loss=0.0001792, whisper_loss=0.07108, over 14267.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01153, ecapa_loss=0.0001996, whisper_loss=0.09226, over 3885067.77 frames. ], batch size: 54, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:20:35,268 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 10:20:43,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1047570.0, ans=0.07 2024-08-11 10:20:45,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1047570.0, ans=0.2 2024-08-11 10:21:30,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3350, loss[loss=0.1095, beats_loss=0.009849, ecapa_loss=0.0002289, whisper_loss=0.09739, over 17074.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01148, ecapa_loss=0.0002, whisper_loss=0.09237, over 3895166.21 frames. ], batch size: 66, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:21:32,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1047970.0, ans=0.125 2024-08-11 10:21:34,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 2.767e+01 3.123e+01 3.740e+01 5.333e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-11 10:21:45,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1048070.0, ans=0.125 2024-08-11 10:22:00,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1048170.0, ans=0.0 2024-08-11 10:22:01,248 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 10:22:02,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=12.0 2024-08-11 10:22:12,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1048270.0, ans=0.0 2024-08-11 10:22:14,804 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 10:22:37,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3400, loss[loss=0.1049, beats_loss=0.009085, ecapa_loss=0.0002038, whisper_loss=0.09381, over 18047.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01139, ecapa_loss=0.0002001, whisper_loss=0.0925, over 3866676.65 frames. ], batch size: 71, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:23:13,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1048670.0, ans=10.0 2024-08-11 10:23:20,147 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 10:23:29,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1048770.0, ans=0.125 2024-08-11 10:23:46,454 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3450, loss[loss=0.1124, beats_loss=0.01109, ecapa_loss=0.0001467, whisper_loss=0.09983, over 22157.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01132, ecapa_loss=0.0002001, whisper_loss=0.09314, over 3887361.32 frames. ], batch size: 84, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:23:50,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.572e+01 2.937e+01 3.389e+01 1.105e+02, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 10:24:02,196 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 10:24:09,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1049070.0, ans=0.125 2024-08-11 10:24:14,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1049170.0, ans=0.125 2024-08-11 10:24:30,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1049270.0, ans=0.1 2024-08-11 10:24:33,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1049270.0, ans=0.0 2024-08-11 10:24:42,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-08-11 10:24:49,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2024-08-11 10:24:55,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3500, loss[loss=0.09614, beats_loss=0.0136, ecapa_loss=0.0001868, whisper_loss=0.08067, over 21821.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0113, ecapa_loss=0.0002009, whisper_loss=0.09351, over 3889105.22 frames. ], batch size: 90, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:25:05,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2024-08-11 10:25:08,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1049570.0, ans=0.0 2024-08-11 10:25:11,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1049570.0, ans=0.125 2024-08-11 10:25:16,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1049570.0, ans=15.0 2024-08-11 10:25:25,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1049670.0, ans=0.2 2024-08-11 10:25:31,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1049670.0, ans=0.125 2024-08-11 10:25:50,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1049870.0, ans=0.0 2024-08-11 10:25:54,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1049870.0, ans=0.0 2024-08-11 10:26:02,902 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3550, loss[loss=0.1084, beats_loss=0.01245, ecapa_loss=0.000151, whisper_loss=0.09449, over 23756.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01136, ecapa_loss=0.0001995, whisper_loss=0.09311, over 3888121.98 frames. ], batch size: 90, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:26:07,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.679e+01 2.987e+01 3.672e+01 5.992e+01, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 10:26:10,120 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 10:26:15,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1050070.0, ans=0.0 2024-08-11 10:26:21,320 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 10:26:39,196 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 10:27:03,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-08-11 10:27:05,994 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 10:27:12,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3600, loss[loss=0.09692, beats_loss=0.01014, ecapa_loss=0.0002325, whisper_loss=0.08445, over 16078.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01141, ecapa_loss=0.000199, whisper_loss=0.09292, over 3880079.16 frames. ], batch size: 64, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:27:18,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1050470.0, ans=0.1 2024-08-11 10:27:18,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1050470.0, ans=0.125 2024-08-11 10:27:20,815 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 10:27:27,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1050570.0, ans=10.0 2024-08-11 10:27:37,297 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 10:27:42,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1050670.0, ans=0.07 2024-08-11 10:27:42,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-08-11 10:27:57,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1050770.0, ans=0.1 2024-08-11 10:28:18,482 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.716e-01 2024-08-11 10:28:22,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3650, loss[loss=0.0984, beats_loss=0.01205, ecapa_loss=0.0001706, whisper_loss=0.08465, over 15248.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0113, ecapa_loss=0.0001995, whisper_loss=0.09371, over 3866664.70 frames. ], batch size: 57, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:28:26,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.681e+01 3.041e+01 3.404e+01 5.123e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 10:28:38,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1051070.0, ans=0.07 2024-08-11 10:28:40,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1051070.0, ans=0.0 2024-08-11 10:28:42,636 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 10:28:43,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2024-08-11 10:28:51,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1051170.0, ans=0.2 2024-08-11 10:29:08,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-11 10:29:15,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1051270.0, ans=0.125 2024-08-11 10:29:19,587 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 10:29:33,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3700, loss[loss=0.09255, beats_loss=0.01222, ecapa_loss=0.00019, whisper_loss=0.07843, over 22287.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01128, ecapa_loss=0.000201, whisper_loss=0.09357, over 3845057.10 frames. ], batch size: 93, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:29:35,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.38 vs. limit=10.0 2024-08-11 10:29:36,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1051470.0, ans=0.05 2024-08-11 10:29:50,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-08-11 10:29:51,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1051570.0, ans=0.1 2024-08-11 10:29:54,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1051570.0, ans=0.1 2024-08-11 10:30:01,617 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 10:30:05,797 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 10:30:10,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1051670.0, ans=0.0 2024-08-11 10:30:18,688 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 10:30:26,936 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 10:30:39,325 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 10:30:40,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-11 10:30:45,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3750, loss[loss=0.08664, beats_loss=0.01174, ecapa_loss=0.0002838, whisper_loss=0.07206, over 16982.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0002022, whisper_loss=0.09311, over 3848564.09 frames. ], batch size: 77, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:30:49,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.786e+01 3.057e+01 3.501e+01 5.299e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 10:30:55,325 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 10:31:02,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1052070.0, ans=0.0 2024-08-11 10:31:06,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1052070.0, ans=0.0 2024-08-11 10:31:12,472 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 10:31:21,129 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 10:31:35,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1052270.0, ans=0.125 2024-08-11 10:31:43,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1052370.0, ans=0.125 2024-08-11 10:31:55,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3800, loss[loss=0.1223, beats_loss=0.01019, ecapa_loss=0.0001819, whisper_loss=0.1103, over 23596.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002012, whisper_loss=0.09323, over 3869896.08 frames. ], batch size: 92, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:31:59,953 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 10:32:01,220 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 10:32:07,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1052470.0, ans=0.1 2024-08-11 10:32:08,418 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 10:32:08,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052570.0, ans=0.1 2024-08-11 10:32:21,896 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 10:32:31,240 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 38 from Vox, 30 fro AS 2024-08-11 10:32:34,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1052670.0, ans=0.0 2024-08-11 10:32:38,500 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 10:32:39,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1052770.0, ans=0.125 2024-08-11 10:32:40,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1052770.0, ans=0.1 2024-08-11 10:32:48,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-11 10:32:50,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1052770.0, ans=0.0 2024-08-11 10:33:06,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3850, loss[loss=0.06915, beats_loss=0.01417, ecapa_loss=0.0001646, whisper_loss=0.05333, over 15617.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0114, ecapa_loss=0.0002007, whisper_loss=0.09359, over 3864869.41 frames. ], batch size: 65, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:33:10,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.764e+01 3.232e+01 3.837e+01 5.936e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-11 10:33:22,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1053070.0, ans=0.125 2024-08-11 10:33:34,937 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-11 10:33:43,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1053170.0, ans=0.125 2024-08-11 10:33:57,918 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 10:34:05,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1053370.0, ans=0.125 2024-08-11 10:34:15,110 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 10:34:16,352 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3900, loss[loss=0.1137, beats_loss=0.01055, ecapa_loss=0.0002334, whisper_loss=0.1008, over 22678.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002014, whisper_loss=0.0939, over 3859897.95 frames. ], batch size: 89, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:34:26,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1053470.0, ans=0.2 2024-08-11 10:34:28,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-11 10:34:29,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1053570.0, ans=0.125 2024-08-11 10:34:31,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=22.5 2024-08-11 10:34:33,183 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-11 10:34:47,342 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 10:34:57,087 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 10:34:57,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-08-11 10:34:59,901 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-11 10:35:15,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2024-08-11 10:35:20,543 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-11 10:35:27,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 3950, loss[loss=0.07575, beats_loss=0.01614, ecapa_loss=0.0001288, whisper_loss=0.05833, over 17300.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01137, ecapa_loss=0.0002025, whisper_loss=0.09401, over 3871864.67 frames. ], batch size: 70, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:35:32,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.817e+01 3.170e+01 3.825e+01 1.516e+02, threshold=6.340e+01, percent-clipped=2.0 2024-08-11 10:35:44,860 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 33 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 10:35:51,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-11 10:35:54,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1054070.0, ans=0.125 2024-08-11 10:35:55,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1054070.0, ans=0.0 2024-08-11 10:36:03,057 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 10:36:19,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=12.0 2024-08-11 10:36:36,204 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 16 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 10:36:37,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1054370.0, ans=0.125 2024-08-11 10:36:42,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4000, loss[loss=0.1186, beats_loss=0.008256, ecapa_loss=0.000266, whisper_loss=0.1077, over 21768.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01134, ecapa_loss=0.0002033, whisper_loss=0.0935, over 3877494.93 frames. ], batch size: 89, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:36:44,877 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 10:36:46,681 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 10:36:49,909 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 10:36:51,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-11 10:37:00,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1054570.0, ans=0.125 2024-08-11 10:37:01,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1054570.0, ans=0.125 2024-08-11 10:37:03,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1054570.0, ans=0.0 2024-08-11 10:37:22,378 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 10:37:54,880 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 10:37:58,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4050, loss[loss=0.0867, beats_loss=0.01323, ecapa_loss=0.0002512, whisper_loss=0.07096, over 20102.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01125, ecapa_loss=0.0002041, whisper_loss=0.09404, over 3892744.08 frames. ], batch size: 90, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:38:03,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.646e+01 2.921e+01 3.336e+01 5.282e+01, threshold=5.841e+01, percent-clipped=0.0 2024-08-11 10:38:14,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1055070.0, ans=0.125 2024-08-11 10:38:15,776 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 10:38:29,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-08-11 10:38:48,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2024-08-11 10:38:59,888 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 10:39:15,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4100, loss[loss=0.1096, beats_loss=0.01172, ecapa_loss=0.000233, whisper_loss=0.09552, over 20479.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01119, ecapa_loss=0.000204, whisper_loss=0.09455, over 3904402.60 frames. ], batch size: 86, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:39:17,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-11 10:39:20,153 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 10:39:25,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1055470.0, ans=0.2 2024-08-11 10:39:25,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1055470.0, ans=0.125 2024-08-11 10:39:25,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1055470.0, ans=0.125 2024-08-11 10:39:40,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1055570.0, ans=0.125 2024-08-11 10:39:41,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055570.0, ans=0.1 2024-08-11 10:39:44,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1055570.0, ans=0.0 2024-08-11 10:39:47,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1055670.0, ans=0.1 2024-08-11 10:39:48,338 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 10:40:02,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1055770.0, ans=0.125 2024-08-11 10:40:20,944 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 10:40:26,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1055870.0, ans=0.025 2024-08-11 10:40:34,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4150, loss[loss=0.08977, beats_loss=0.01386, ecapa_loss=0.0001901, whisper_loss=0.07401, over 20607.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01122, ecapa_loss=0.0002044, whisper_loss=0.09409, over 3894863.82 frames. ], batch size: 83, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:40:34,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1055970.0, ans=0.125 2024-08-11 10:40:38,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.691e+01 3.023e+01 3.383e+01 1.135e+02, threshold=6.046e+01, percent-clipped=2.0 2024-08-11 10:40:43,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1055970.0, ans=0.2 2024-08-11 10:40:48,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1056070.0, ans=0.0 2024-08-11 10:40:50,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1056070.0, ans=0.0 2024-08-11 10:40:58,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1056070.0, ans=0.125 2024-08-11 10:40:58,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1056070.0, ans=0.125 2024-08-11 10:41:00,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1056070.0, ans=0.125 2024-08-11 10:41:25,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1056270.0, ans=0.125 2024-08-11 10:41:29,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1056270.0, ans=0.125 2024-08-11 10:41:41,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=15.0 2024-08-11 10:41:44,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1056370.0, ans=0.0 2024-08-11 10:41:44,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1056370.0, ans=0.125 2024-08-11 10:41:47,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1056470.0, ans=0.1 2024-08-11 10:41:48,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4200, loss[loss=0.08963, beats_loss=0.01337, ecapa_loss=0.0001768, whisper_loss=0.07448, over 16576.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01123, ecapa_loss=0.0002038, whisper_loss=0.09389, over 3906110.46 frames. ], batch size: 68, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:41:53,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-11 10:42:01,539 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 10:42:09,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1056570.0, ans=0.0 2024-08-11 10:42:10,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1056570.0, ans=0.125 2024-08-11 10:42:30,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1056670.0, ans=0.125 2024-08-11 10:42:38,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1056770.0, ans=0.125 2024-08-11 10:42:43,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1056770.0, ans=0.125 2024-08-11 10:42:54,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1056870.0, ans=0.125 2024-08-11 10:43:02,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4250, loss[loss=0.1228, beats_loss=0.01127, ecapa_loss=0.0001733, whisper_loss=0.1098, over 17795.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01119, ecapa_loss=0.0002037, whisper_loss=0.09462, over 3904081.24 frames. ], batch size: 69, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:43:06,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-08-11 10:43:07,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.666e+01 2.925e+01 3.281e+01 5.407e+01, threshold=5.850e+01, percent-clipped=0.0 2024-08-11 10:43:27,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1057070.0, ans=0.0 2024-08-11 10:44:15,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1057470.0, ans=0.0 2024-08-11 10:44:16,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4300, loss[loss=0.1079, beats_loss=0.009114, ecapa_loss=0.0002366, whisper_loss=0.09643, over 13533.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01116, ecapa_loss=0.0002038, whisper_loss=0.09442, over 3906559.75 frames. ], batch size: 54, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:44:28,062 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 10:44:56,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=15.0 2024-08-11 10:45:09,701 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 10:45:30,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1057870.0, ans=0.2 2024-08-11 10:45:30,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1057870.0, ans=0.025 2024-08-11 10:45:33,127 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 10:45:34,054 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4350, loss[loss=0.1046, beats_loss=0.01119, ecapa_loss=0.0001754, whisper_loss=0.09162, over 18489.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0112, ecapa_loss=0.0002041, whisper_loss=0.09414, over 3909603.73 frames. ], batch size: 72, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:45:38,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.592e+01 2.860e+01 3.306e+01 4.790e+01, threshold=5.719e+01, percent-clipped=0.0 2024-08-11 10:45:51,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2024-08-11 10:46:30,754 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 10:46:33,738 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 10:46:38,334 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 10:46:51,155 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4400, loss[loss=0.1236, beats_loss=0.008948, ecapa_loss=0.0002086, whisper_loss=0.1126, over 21269.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01122, ecapa_loss=0.0002045, whisper_loss=0.09354, over 3893487.35 frames. ], batch size: 84, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:46:57,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1058470.0, ans=0.125 2024-08-11 10:47:04,723 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 10:47:10,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2024-08-11 10:47:23,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1058670.0, ans=0.0 2024-08-11 10:47:26,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1058670.0, ans=0.05 2024-08-11 10:47:28,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-11 10:47:47,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1058770.0, ans=0.125 2024-08-11 10:48:13,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4450, loss[loss=0.09947, beats_loss=0.01069, ecapa_loss=0.0002284, whisper_loss=0.08649, over 15172.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01121, ecapa_loss=0.000205, whisper_loss=0.09354, over 3883453.46 frames. ], batch size: 63, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:48:17,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.805e+01 3.007e+01 3.333e+01 6.979e+01, threshold=6.014e+01, percent-clipped=1.0 2024-08-11 10:48:50,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1059170.0, ans=0.0 2024-08-11 10:49:00,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-11 10:49:02,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1059270.0, ans=0.125 2024-08-11 10:49:08,062 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 10:49:09,415 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 10:49:28,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1059470.0, ans=0.0 2024-08-11 10:49:29,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.07 vs. limit=22.5 2024-08-11 10:49:29,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4500, loss[loss=0.1063, beats_loss=0.01197, ecapa_loss=0.0001821, whisper_loss=0.09255, over 22667.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01126, ecapa_loss=0.0002035, whisper_loss=0.09354, over 3888652.97 frames. ], batch size: 90, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:50:29,302 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 10:50:43,006 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 10:50:44,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4550, loss[loss=0.1213, beats_loss=0.01163, ecapa_loss=0.0001988, whisper_loss=0.1077, over 22839.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01126, ecapa_loss=0.0002041, whisper_loss=0.09334, over 3887293.55 frames. ], batch size: 90, lr: 7.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:50:45,022 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-11 10:50:45,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.73 vs. limit=15.0 2024-08-11 10:50:48,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.557e+01 2.865e+01 3.375e+01 6.211e+01, threshold=5.730e+01, percent-clipped=1.0 2024-08-11 10:51:02,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1060070.0, ans=0.125 2024-08-11 10:51:17,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1060170.0, ans=0.02 2024-08-11 10:51:27,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1060170.0, ans=0.125 2024-08-11 10:51:28,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1060170.0, ans=0.07 2024-08-11 10:51:51,922 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 10:51:54,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1060370.0, ans=0.125 2024-08-11 10:51:54,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1060370.0, ans=0.0 2024-08-11 10:51:56,378 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 39 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 10:52:00,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4600, loss[loss=0.1026, beats_loss=0.0132, ecapa_loss=0.0001803, whisper_loss=0.08758, over 22561.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01136, ecapa_loss=0.0002033, whisper_loss=0.09322, over 3873167.24 frames. ], batch size: 90, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:52:04,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-11 10:52:12,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-08-11 10:52:17,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1060570.0, ans=0.125 2024-08-11 10:52:23,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1060570.0, ans=0.2 2024-08-11 10:52:27,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1060570.0, ans=0.125 2024-08-11 10:52:31,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1060670.0, ans=0.1 2024-08-11 10:52:33,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1060670.0, ans=0.2 2024-08-11 10:52:41,189 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 10:52:48,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1060770.0, ans=0.2 2024-08-11 10:53:20,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1060970.0, ans=0.0 2024-08-11 10:53:21,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4650, loss[loss=0.1084, beats_loss=0.01219, ecapa_loss=0.0001874, whisper_loss=0.09429, over 19060.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01144, ecapa_loss=0.0002035, whisper_loss=0.09221, over 3892822.67 frames. ], batch size: 74, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:53:26,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.723e+01 3.113e+01 3.495e+01 7.663e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 10:53:33,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1060970.0, ans=0.0 2024-08-11 10:53:41,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1061070.0, ans=0.1 2024-08-11 10:54:21,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1061270.0, ans=0.125 2024-08-11 10:54:34,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1061370.0, ans=0.125 2024-08-11 10:54:42,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4700, loss[loss=0.09635, beats_loss=0.01148, ecapa_loss=0.0002217, whisper_loss=0.08266, over 20968.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01137, ecapa_loss=0.0002047, whisper_loss=0.09258, over 3855884.98 frames. ], batch size: 91, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:54:47,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061470.0, ans=0.1 2024-08-11 10:54:48,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1061470.0, ans=0.0 2024-08-11 10:54:53,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1061470.0, ans=0.0 2024-08-11 10:54:55,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1061470.0, ans=0.125 2024-08-11 10:55:10,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1061570.0, ans=0.0 2024-08-11 10:55:16,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1061670.0, ans=0.125 2024-08-11 10:55:16,937 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 10:55:17,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1061670.0, ans=0.0 2024-08-11 10:55:35,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1061770.0, ans=0.125 2024-08-11 10:55:40,304 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 10:55:56,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1061870.0, ans=0.07 2024-08-11 10:55:58,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061870.0, ans=0.1 2024-08-11 10:56:05,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4750, loss[loss=0.09944, beats_loss=0.01056, ecapa_loss=0.0002457, whisper_loss=0.08642, over 21339.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002029, whisper_loss=0.0932, over 3865804.83 frames. ], batch size: 90, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:56:10,306 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.759e+01 3.104e+01 3.569e+01 5.241e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-11 10:56:13,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1061970.0, ans=0.1 2024-08-11 10:56:26,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1062070.0, ans=0.125 2024-08-11 10:56:38,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1062170.0, ans=0.125 2024-08-11 10:56:57,650 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 10:57:02,989 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 10:57:18,777 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 10:57:31,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4800, loss[loss=0.1011, beats_loss=0.01166, ecapa_loss=0.0002338, whisper_loss=0.08714, over 21452.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0002029, whisper_loss=0.09338, over 3928904.00 frames. ], batch size: 92, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:58:03,597 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 10:58:08,111 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 10:58:24,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1062770.0, ans=0.1 2024-08-11 10:58:38,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-11 10:58:55,808 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4850, loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001928, whisper_loss=0.08954, over 15881.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01145, ecapa_loss=0.0002024, whisper_loss=0.09338, over 3927705.16 frames. ], batch size: 58, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:59:00,364 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.634e+01 3.190e+01 3.671e+01 5.547e+01, threshold=6.379e+01, percent-clipped=0.0 2024-08-11 10:59:00,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1062970.0, ans=0.2 2024-08-11 10:59:06,987 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 10:59:38,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1063170.0, ans=0.0 2024-08-11 10:59:47,110 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 11:00:00,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1063370.0, ans=0.2 2024-08-11 11:00:04,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1063370.0, ans=0.125 2024-08-11 11:00:15,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4900, loss[loss=0.1243, beats_loss=0.006887, ecapa_loss=0.000278, whisper_loss=0.1146, over 17199.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01141, ecapa_loss=0.0002035, whisper_loss=0.09341, over 3919791.69 frames. ], batch size: 71, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:00:28,483 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-11 11:00:39,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1063570.0, ans=0.1 2024-08-11 11:00:52,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-08-11 11:00:53,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2024-08-11 11:00:59,761 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 11:01:16,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1063770.0, ans=0.125 2024-08-11 11:01:31,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-08-11 11:01:37,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 4950, loss[loss=0.1007, beats_loss=0.01268, ecapa_loss=0.0001852, whisper_loss=0.08621, over 22591.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01142, ecapa_loss=0.000203, whisper_loss=0.09293, over 3897656.97 frames. ], batch size: 89, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:01:43,720 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.682e+01 3.010e+01 3.354e+01 5.437e+01, threshold=6.020e+01, percent-clipped=0.0 2024-08-11 11:01:47,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1063970.0, ans=0.0 2024-08-11 11:02:22,788 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 11:02:27,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.05 vs. limit=22.5 2024-08-11 11:02:31,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1064270.0, ans=0.0 2024-08-11 11:02:37,561 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 11:02:38,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1064270.0, ans=15.0 2024-08-11 11:02:50,909 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 11:02:56,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064370.0, ans=0.1 2024-08-11 11:03:00,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5000, loss[loss=0.1105, beats_loss=0.01075, ecapa_loss=0.0002293, whisper_loss=0.09748, over 22448.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01135, ecapa_loss=0.0002057, whisper_loss=0.09349, over 3886360.75 frames. ], batch size: 93, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:03:05,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1064470.0, ans=0.035 2024-08-11 11:03:11,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=15.0 2024-08-11 11:03:32,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1064670.0, ans=0.125 2024-08-11 11:03:35,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1064670.0, ans=0.0 2024-08-11 11:03:40,359 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 11:03:49,096 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-11 11:04:24,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1064970.0, ans=0.125 2024-08-11 11:04:24,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5050, loss[loss=0.09784, beats_loss=0.01257, ecapa_loss=0.0001907, whisper_loss=0.08337, over 15575.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01144, ecapa_loss=0.0002046, whisper_loss=0.09319, over 3907922.62 frames. ], batch size: 63, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:04:30,018 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 2.899e+01 3.463e+01 4.526e+01, threshold=5.797e+01, percent-clipped=0.0 2024-08-11 11:04:50,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2024-08-11 11:04:56,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1065070.0, ans=0.125 2024-08-11 11:05:12,729 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 11:05:20,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1065270.0, ans=0.125 2024-08-11 11:05:21,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1065270.0, ans=0.0 2024-08-11 11:05:25,045 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 11:05:47,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1065370.0, ans=0.125 2024-08-11 11:05:51,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1065370.0, ans=0.0 2024-08-11 11:05:53,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5100, loss[loss=0.09967, beats_loss=0.01415, ecapa_loss=0.000186, whisper_loss=0.08366, over 22097.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01144, ecapa_loss=0.0002037, whisper_loss=0.09385, over 3921499.79 frames. ], batch size: 91, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:05:59,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2024-08-11 11:06:09,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-08-11 11:06:36,851 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 11:06:52,746 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 11:06:55,458 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 14 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 11:06:56,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-11 11:06:58,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1065770.0, ans=0.2 2024-08-11 11:07:14,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1065870.0, ans=0.1 2024-08-11 11:07:16,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5150, loss[loss=0.1136, beats_loss=0.0108, ecapa_loss=0.000202, whisper_loss=0.1008, over 22625.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01145, ecapa_loss=0.0002024, whisper_loss=0.09405, over 3920995.00 frames. ], batch size: 91, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:07:22,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.743e+01 3.078e+01 3.597e+01 5.105e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-11 11:07:28,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1065970.0, ans=0.125 2024-08-11 11:07:31,089 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 11:08:14,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-11 11:08:30,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1066370.0, ans=0.125 2024-08-11 11:08:33,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5200, loss[loss=0.1052, beats_loss=0.008059, ecapa_loss=0.000185, whisper_loss=0.09528, over 15928.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01138, ecapa_loss=0.000202, whisper_loss=0.0944, over 3894210.41 frames. ], batch size: 58, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:08:37,902 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 11:08:44,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1066470.0, ans=0.125 2024-08-11 11:08:52,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1066570.0, ans=0.0 2024-08-11 11:08:52,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1066570.0, ans=0.0 2024-08-11 11:08:54,865 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 11:09:01,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1066570.0, ans=0.125 2024-08-11 11:09:29,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1066770.0, ans=0.0 2024-08-11 11:09:31,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1066770.0, ans=0.125 2024-08-11 11:09:33,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1066770.0, ans=0.125 2024-08-11 11:09:49,780 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-11 11:09:52,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5250, loss[loss=0.1072, beats_loss=0.01198, ecapa_loss=0.0001982, whisper_loss=0.09327, over 22261.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01138, ecapa_loss=0.0002013, whisper_loss=0.09401, over 3866484.56 frames. ], batch size: 91, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:09:57,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.555e+01 2.975e+01 3.407e+01 4.666e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 11:10:07,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1067070.0, ans=0.0 2024-08-11 11:10:40,967 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-11 11:11:03,826 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 11:11:04,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1067370.0, ans=0.2 2024-08-11 11:11:10,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5300, loss[loss=0.1033, beats_loss=0.01176, ecapa_loss=0.0001292, whisper_loss=0.0902, over 17187.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0113, ecapa_loss=0.000202, whisper_loss=0.09409, over 3850763.52 frames. ], batch size: 63, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:11:41,068 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 11:11:43,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-11 11:11:44,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-08-11 11:12:28,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2024-08-11 11:12:29,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5350, loss[loss=0.08596, beats_loss=0.01454, ecapa_loss=0.0002026, whisper_loss=0.06939, over 16557.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01128, ecapa_loss=0.0002021, whisper_loss=0.09389, over 3852962.23 frames. ], batch size: 69, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:12:36,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.785e+01 3.077e+01 3.493e+01 6.327e+01, threshold=6.155e+01, percent-clipped=1.0 2024-08-11 11:12:38,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-11 11:13:10,301 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 11 from Vox, 46 fro AS 2024-08-11 11:14:11,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1068370.0, ans=0.0 2024-08-11 11:14:14,089 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-11 11:14:15,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5400, loss[loss=0.1165, beats_loss=0.007161, ecapa_loss=0.0002142, whisper_loss=0.1072, over 16502.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01122, ecapa_loss=0.0002008, whisper_loss=0.09451, over 3860775.10 frames. ], batch size: 61, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:14:15,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1068470.0, ans=0.2 2024-08-11 11:14:28,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:14:31,700 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 11:14:37,786 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-11 11:14:47,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:14:51,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1068670.0, ans=0.025 2024-08-11 11:15:00,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.93 vs. limit=15.0 2024-08-11 11:15:02,885 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 11:15:21,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1068770.0, ans=0.125 2024-08-11 11:15:27,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1068770.0, ans=0.0 2024-08-11 11:15:35,200 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.944e+00 2024-08-11 11:15:37,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2024-08-11 11:15:40,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=15.0 2024-08-11 11:15:50,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5450, loss[loss=0.1136, beats_loss=0.01116, ecapa_loss=0.0002228, whisper_loss=0.1002, over 22333.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01123, ecapa_loss=0.0002018, whisper_loss=0.09405, over 3863519.13 frames. ], batch size: 90, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:15:52,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1068970.0, ans=0.2 2024-08-11 11:15:56,418 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 11:15:57,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.869e+01 3.117e+01 3.592e+01 6.207e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 11:16:10,820 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 11:16:19,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1069070.0, ans=0.125 2024-08-11 11:16:24,169 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 11:16:33,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1069170.0, ans=0.125 2024-08-11 11:16:37,915 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 11:16:41,013 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-11 11:16:58,222 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 11:17:35,772 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5500, loss[loss=0.09615, beats_loss=0.01467, ecapa_loss=0.0001731, whisper_loss=0.07974, over 14739.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01134, ecapa_loss=0.0002013, whisper_loss=0.09309, over 3848982.17 frames. ], batch size: 58, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:18:06,736 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 11:18:31,322 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 11:18:38,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1069670.0, ans=0.0 2024-08-11 11:19:22,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5550, loss[loss=0.1041, beats_loss=0.01306, ecapa_loss=0.0001621, whisper_loss=0.08945, over 19219.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01134, ecapa_loss=0.0002006, whisper_loss=0.09351, over 3880896.27 frames. ], batch size: 76, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:19:28,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.609e+01 2.954e+01 3.474e+01 6.484e+01, threshold=5.909e+01, percent-clipped=2.0 2024-08-11 11:19:36,068 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 11:19:38,999 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-11 11:19:58,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1070070.0, ans=0.0 2024-08-11 11:20:12,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1070170.0, ans=0.1 2024-08-11 11:20:20,539 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-11 11:20:35,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-11 11:20:41,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=27.68 vs. limit=15.0 2024-08-11 11:20:44,400 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 11:20:54,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5600, loss[loss=0.1032, beats_loss=0.01141, ecapa_loss=0.0002422, whisper_loss=0.08938, over 21982.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01133, ecapa_loss=0.0002004, whisper_loss=0.09348, over 3913111.48 frames. ], batch size: 92, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:20:56,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.21 vs. limit=22.5 2024-08-11 11:21:21,031 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 11:21:24,021 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 11:21:26,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2024-08-11 11:21:27,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1070670.0, ans=0.0 2024-08-11 11:21:42,736 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-11 11:21:58,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1070870.0, ans=0.95 2024-08-11 11:22:07,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5650, loss[loss=0.09876, beats_loss=0.01145, ecapa_loss=0.0002479, whisper_loss=0.08483, over 16135.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01148, ecapa_loss=0.0001996, whisper_loss=0.09268, over 3925506.47 frames. ], batch size: 69, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:22:11,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.577e+01 2.929e+01 3.455e+01 8.964e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 11:22:14,693 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 11:22:44,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.64 vs. limit=22.5 2024-08-11 11:22:47,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1071170.0, ans=0.0 2024-08-11 11:23:01,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-11 11:23:16,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071370.0, ans=0.1 2024-08-11 11:23:25,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5700, loss[loss=0.1075, beats_loss=0.01117, ecapa_loss=0.0002378, whisper_loss=0.09391, over 21533.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01153, ecapa_loss=0.000199, whisper_loss=0.0927, over 3941482.10 frames. ], batch size: 93, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:23:32,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-08-11 11:23:38,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1071470.0, ans=0.125 2024-08-11 11:23:41,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-11 11:23:47,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1071570.0, ans=0.125 2024-08-11 11:24:16,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1071770.0, ans=0.2 2024-08-11 11:24:17,219 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 11:24:43,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1071970.0, ans=0.125 2024-08-11 11:24:43,798 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5750, loss[loss=0.1201, beats_loss=0.009499, ecapa_loss=0.0002014, whisper_loss=0.1086, over 21611.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0115, ecapa_loss=0.0002002, whisper_loss=0.09224, over 3911060.99 frames. ], batch size: 83, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:24:48,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.718e+01 3.107e+01 3.541e+01 5.804e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 11:24:49,005 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 11:25:21,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1072170.0, ans=0.125 2024-08-11 11:25:23,099 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-11 11:25:29,108 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 11:25:30,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1072270.0, ans=0.1 2024-08-11 11:25:34,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1072270.0, ans=0.125 2024-08-11 11:25:56,778 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 35 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 11:26:03,016 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5800, loss[loss=0.09056, beats_loss=0.01628, ecapa_loss=0.000205, whisper_loss=0.07223, over 22352.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01151, ecapa_loss=0.0001995, whisper_loss=0.09245, over 3922637.62 frames. ], batch size: 95, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:26:17,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=12.0 2024-08-11 11:26:29,310 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 11:26:35,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2024-08-11 11:26:41,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2024-08-11 11:26:54,031 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 11:27:08,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1072870.0, ans=0.125 2024-08-11 11:27:18,624 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5850, loss[loss=0.1229, beats_loss=0.01054, ecapa_loss=0.0001957, whisper_loss=0.1104, over 22313.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01154, ecapa_loss=0.0002, whisper_loss=0.092, over 3928384.33 frames. ], batch size: 89, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:27:23,673 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.774e+01 3.139e+01 3.627e+01 6.860e+01, threshold=6.277e+01, percent-clipped=1.0 2024-08-11 11:27:24,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1072970.0, ans=0.125 2024-08-11 11:27:49,453 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 11:28:10,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1073270.0, ans=0.0 2024-08-11 11:28:27,365 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 11:28:29,838 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 11:28:31,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5900, loss[loss=0.1038, beats_loss=0.01375, ecapa_loss=0.0001914, whisper_loss=0.08817, over 13749.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01146, ecapa_loss=0.0002017, whisper_loss=0.09244, over 3911246.22 frames. ], batch size: 54, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:28:58,884 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 11:29:02,210 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 11:29:05,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1073670.0, ans=0.125 2024-08-11 11:29:17,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1073770.0, ans=0.015 2024-08-11 11:29:35,483 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 11:29:42,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 5950, loss[loss=0.1089, beats_loss=0.01017, ecapa_loss=0.00021, whisper_loss=0.09663, over 22346.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01146, ecapa_loss=0.0002013, whisper_loss=0.0923, over 3892768.68 frames. ], batch size: 90, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:29:47,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.700e+01 3.029e+01 3.647e+01 6.302e+01, threshold=6.057e+01, percent-clipped=1.0 2024-08-11 11:29:55,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1073970.0, ans=0.125 2024-08-11 11:30:25,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-11 11:30:31,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2024-08-11 11:30:41,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1074370.0, ans=0.95 2024-08-11 11:30:52,378 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-11 11:30:56,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6000, loss[loss=0.1111, beats_loss=0.01277, ecapa_loss=0.000164, whisper_loss=0.09668, over 22309.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0115, ecapa_loss=0.0002009, whisper_loss=0.09248, over 3884326.67 frames. ], batch size: 88, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:30:56,102 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 11:31:34,863 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006404, whisper_loss=0.2522, over 922467.00 frames. 2024-08-11 11:31:52,367 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on SV_voxceleb1: loss=0.005252, beats_loss=0, ecapa_loss=0.0005252, whisper_loss=0, over 939242.00 frames. 2024-08-11 11:33:21,223 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.6869, 1.5323, 1.7833, 1.6984, 2.2535, 1.5771, 1.7645, 1.8033], device='cuda:3') 2024-08-11 11:33:45,243 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on AT_audioset: loss=0.02539, beats_loss=0.02539, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 11:33:45,253 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 11:33:55,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1074470.0, ans=0.125 2024-08-11 11:34:16,247 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 11:34:37,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1074770.0, ans=0.125 2024-08-11 11:34:55,140 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 11:34:56,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1074870.0, ans=0.0 2024-08-11 11:34:58,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6050, loss[loss=0.09151, beats_loss=0.01281, ecapa_loss=0.0002324, whisper_loss=0.07638, over 16025.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0114, ecapa_loss=0.000202, whisper_loss=0.09324, over 3864047.21 frames. ], batch size: 67, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:34:59,381 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 11:35:03,611 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.749e+01 3.055e+01 3.427e+01 5.083e+01, threshold=6.111e+01, percent-clipped=0.0 2024-08-11 11:35:18,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-08-11 11:35:19,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-11 11:35:23,233 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 11:35:52,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1075270.0, ans=0.2 2024-08-11 11:35:53,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1075270.0, ans=0.2 2024-08-11 11:35:56,111 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 11:35:56,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1075270.0, ans=0.125 2024-08-11 11:36:07,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2024-08-11 11:36:14,019 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6100, loss[loss=0.1132, beats_loss=0.0109, ecapa_loss=0.0001728, whisper_loss=0.1006, over 20991.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01145, ecapa_loss=0.000202, whisper_loss=0.09274, over 3865637.86 frames. ], batch size: 80, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:36:18,875 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 11:36:27,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1075470.0, ans=0.125 2024-08-11 11:36:37,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075570.0, ans=0.1 2024-08-11 11:37:30,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6150, loss[loss=0.09608, beats_loss=0.01443, ecapa_loss=0.0001529, whisper_loss=0.08012, over 23726.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01137, ecapa_loss=0.0002025, whisper_loss=0.09306, over 3860444.56 frames. ], batch size: 93, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:37:34,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.684e+01 3.005e+01 3.339e+01 4.754e+01, threshold=6.009e+01, percent-clipped=0.0 2024-08-11 11:38:03,610 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 11:38:08,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1076170.0, ans=0.0 2024-08-11 11:38:09,577 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 11:38:09,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1076170.0, ans=0.035 2024-08-11 11:38:18,923 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 11:38:22,415 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 11:38:36,690 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 11:38:43,406 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6200, loss[loss=0.1075, beats_loss=0.01098, ecapa_loss=0.0001983, whisper_loss=0.09458, over 18274.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0114, ecapa_loss=0.0002014, whisper_loss=0.09293, over 3870526.70 frames. ], batch size: 73, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:39:04,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1076570.0, ans=22.5 2024-08-11 11:39:09,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1076570.0, ans=0.1 2024-08-11 11:39:23,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1076670.0, ans=0.0 2024-08-11 11:39:27,788 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 11:39:39,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1076770.0, ans=0.125 2024-08-11 11:39:45,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.03 vs. limit=10.0 2024-08-11 11:39:59,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6250, loss[loss=0.1062, beats_loss=0.01253, ecapa_loss=0.0002396, whisper_loss=0.09125, over 21571.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01143, ecapa_loss=0.0002008, whisper_loss=0.09252, over 3907207.73 frames. ], batch size: 90, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:40:03,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1076970.0, ans=0.2 2024-08-11 11:40:04,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 2.830e+01 2.972e+01 3.439e+01 5.876e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 11:40:04,723 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 11:40:06,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-11 11:40:08,953 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 11:40:10,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1076970.0, ans=0.125 2024-08-11 11:40:29,301 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 11:40:36,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1077170.0, ans=0.0 2024-08-11 11:40:41,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1077170.0, ans=0.0 2024-08-11 11:40:46,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1077270.0, ans=0.09899494936611666 2024-08-11 11:40:56,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1077270.0, ans=0.1 2024-08-11 11:40:57,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2024-08-11 11:41:08,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1077370.0, ans=22.5 2024-08-11 11:41:12,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1077470.0, ans=0.0 2024-08-11 11:41:13,130 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6300, loss[loss=0.1054, beats_loss=0.01038, ecapa_loss=0.0002278, whisper_loss=0.0927, over 21414.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01138, ecapa_loss=0.0001994, whisper_loss=0.09323, over 3907864.75 frames. ], batch size: 89, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:41:13,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2024-08-11 11:41:16,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1077470.0, ans=0.1 2024-08-11 11:41:23,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-11 11:41:28,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-11 11:41:29,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1077570.0, ans=0.125 2024-08-11 11:41:51,533 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 9 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 11:41:54,319 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 15 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-11 11:42:07,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1077770.0, ans=0.125 2024-08-11 11:42:09,657 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 11:42:19,568 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-11 11:42:24,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6350, loss[loss=0.06636, beats_loss=0.01411, ecapa_loss=0.0001298, whisper_loss=0.05095, over 16442.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01144, ecapa_loss=0.0001989, whisper_loss=0.09309, over 3904023.26 frames. ], batch size: 64, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:42:29,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.640e+01 2.866e+01 3.160e+01 1.102e+02, threshold=5.732e+01, percent-clipped=1.0 2024-08-11 11:42:39,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1078070.0, ans=0.1 2024-08-11 11:42:48,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1078070.0, ans=0.025 2024-08-11 11:42:49,888 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 11:43:23,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.16 vs. limit=22.5 2024-08-11 11:43:39,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6400, loss[loss=0.1173, beats_loss=0.01138, ecapa_loss=0.0001963, whisper_loss=0.104, over 23055.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01138, ecapa_loss=0.0002003, whisper_loss=0.09312, over 3885164.17 frames. ], batch size: 90, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:44:10,541 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 11:44:26,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1078770.0, ans=10.0 2024-08-11 11:44:31,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2024-08-11 11:44:39,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1078870.0, ans=0.0 2024-08-11 11:44:42,680 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 11:44:45,663 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 11:44:48,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1078870.0, ans=0.1 2024-08-11 11:44:54,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1078870.0, ans=0.1 2024-08-11 11:44:55,994 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6450, loss[loss=0.1216, beats_loss=0.0101, ecapa_loss=0.0002509, whisper_loss=0.109, over 20512.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01129, ecapa_loss=0.0002002, whisper_loss=0.09425, over 3916200.72 frames. ], batch size: 88, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:45:01,151 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.754e+01 3.078e+01 3.674e+01 5.893e+01, threshold=6.156e+01, percent-clipped=1.0 2024-08-11 11:45:04,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1078970.0, ans=0.125 2024-08-11 11:45:08,770 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 11:45:36,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1079170.0, ans=0.125 2024-08-11 11:45:47,292 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 11:46:08,899 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6500, loss[loss=0.1403, beats_loss=0.007758, ecapa_loss=0.0001734, whisper_loss=0.1309, over 17047.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0113, ecapa_loss=0.0002019, whisper_loss=0.09496, over 3930326.63 frames. ], batch size: 62, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:46:12,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1079470.0, ans=0.1 2024-08-11 11:46:27,469 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 11:46:56,526 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 11:47:10,297 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 11:47:13,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1079870.0, ans=0.04949747468305833 2024-08-11 11:47:20,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6550, loss[loss=0.08876, beats_loss=0.0123, ecapa_loss=0.000205, whisper_loss=0.07441, over 18674.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01135, ecapa_loss=0.0002026, whisper_loss=0.09447, over 3958970.47 frames. ], batch size: 76, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:47:27,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 2.781e+01 3.122e+01 3.450e+01 5.322e+01, threshold=6.243e+01, percent-clipped=0.0 2024-08-11 11:47:28,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1079970.0, ans=0.1 2024-08-11 11:47:28,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1079970.0, ans=0.5 2024-08-11 11:47:30,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1079970.0, ans=0.125 2024-08-11 11:47:42,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-11 11:47:44,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1080070.0, ans=0.125 2024-08-11 11:48:09,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1080270.0, ans=0.125 2024-08-11 11:48:15,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=1080270.0, ans=15.0 2024-08-11 11:48:26,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1080370.0, ans=0.2 2024-08-11 11:48:27,786 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 11:48:37,016 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6600, loss[loss=0.1276, beats_loss=0.008958, ecapa_loss=0.0002169, whisper_loss=0.1165, over 23057.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01129, ecapa_loss=0.0002029, whisper_loss=0.09476, over 3980251.93 frames. ], batch size: 93, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:49:16,412 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 11:49:17,941 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 11:49:23,788 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 11:49:24,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1080770.0, ans=0.2 2024-08-11 11:49:32,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1080770.0, ans=0.0 2024-08-11 11:49:40,334 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 11:49:45,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1080870.0, ans=0.0 2024-08-11 11:49:49,338 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 11:49:50,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6650, loss[loss=0.1126, beats_loss=0.01055, ecapa_loss=0.0002123, whisper_loss=0.09994, over 18965.00 frames. ], tot_loss[loss=0.108, beats_loss=0.0113, ecapa_loss=0.0002031, whisper_loss=0.09469, over 3994183.66 frames. ], batch size: 76, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:49:50,595 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 11:49:54,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.681e+01 2.981e+01 3.448e+01 5.241e+01, threshold=5.962e+01, percent-clipped=0.0 2024-08-11 11:50:03,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1081070.0, ans=0.025 2024-08-11 11:50:03,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1081070.0, ans=0.125 2024-08-11 11:50:16,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1081070.0, ans=0.1 2024-08-11 11:50:24,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1081170.0, ans=0.125 2024-08-11 11:50:30,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1081170.0, ans=0.1 2024-08-11 11:50:42,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1081270.0, ans=0.125 2024-08-11 11:50:49,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-11 11:51:01,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6700, loss[loss=0.1022, beats_loss=0.009682, ecapa_loss=0.0002551, whisper_loss=0.08996, over 22033.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0113, ecapa_loss=0.0002046, whisper_loss=0.09441, over 3987681.91 frames. ], batch size: 89, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:51:53,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1081770.0, ans=0.125 2024-08-11 11:52:14,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6750, loss[loss=0.09673, beats_loss=0.01219, ecapa_loss=0.0001495, whisper_loss=0.08304, over 18063.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0002047, whisper_loss=0.09424, over 3975744.99 frames. ], batch size: 69, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:52:18,900 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.942e+01 3.557e+01 4.197e+01 2.407e+02, threshold=7.114e+01, percent-clipped=7.0 2024-08-11 11:52:20,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1081970.0, ans=0.125 2024-08-11 11:52:31,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1082070.0, ans=0.125 2024-08-11 11:52:36,175 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.103e+00 2024-08-11 11:52:36,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1082070.0, ans=0.125 2024-08-11 11:52:48,840 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.677e+00 2024-08-11 11:52:51,415 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 11:52:53,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1082170.0, ans=0.125 2024-08-11 11:52:57,688 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 11:53:03,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2024-08-11 11:53:11,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=12.0 2024-08-11 11:53:18,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1082370.0, ans=0.125 2024-08-11 11:53:18,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1082370.0, ans=0.1 2024-08-11 11:53:19,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1082370.0, ans=0.125 2024-08-11 11:53:19,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1082370.0, ans=0.0 2024-08-11 11:53:26,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-11 11:53:27,015 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6800, loss[loss=0.09949, beats_loss=0.01096, ecapa_loss=0.0002353, whisper_loss=0.08618, over 18925.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01123, ecapa_loss=0.0002049, whisper_loss=0.09464, over 3937903.02 frames. ], batch size: 79, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:53:35,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1082470.0, ans=0.125 2024-08-11 11:53:36,227 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 11:53:42,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1082570.0, ans=0.125 2024-08-11 11:53:43,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1082570.0, ans=0.1 2024-08-11 11:53:52,499 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 11:54:15,876 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 11:54:22,198 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 11:54:23,434 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 11:54:30,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1082870.0, ans=0.04949747468305833 2024-08-11 11:54:33,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1082870.0, ans=0.125 2024-08-11 11:54:33,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1082870.0, ans=0.2 2024-08-11 11:54:35,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1082870.0, ans=0.1 2024-08-11 11:54:39,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6850, loss[loss=0.09667, beats_loss=0.01273, ecapa_loss=0.0001971, whisper_loss=0.08197, over 14512.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01124, ecapa_loss=0.0002026, whisper_loss=0.09449, over 3921777.17 frames. ], batch size: 61, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:54:44,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.694e+01 2.999e+01 3.363e+01 5.238e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 11:54:44,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1082970.0, ans=0.0 2024-08-11 11:54:55,528 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-11 11:55:06,703 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 11:55:23,726 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 11:55:31,566 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 11:55:33,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2024-08-11 11:55:35,382 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 11:55:39,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1083370.0, ans=0.0 2024-08-11 11:55:43,143 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 11:55:49,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6900, loss[loss=0.1202, beats_loss=0.01078, ecapa_loss=0.0002355, whisper_loss=0.107, over 22361.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01117, ecapa_loss=0.0002014, whisper_loss=0.09535, over 3903051.64 frames. ], batch size: 90, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:55:49,929 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 11:55:51,236 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 11:55:51,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1083470.0, ans=0.125 2024-08-11 11:55:51,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1083470.0, ans=0.1 2024-08-11 11:56:17,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1083670.0, ans=0.125 2024-08-11 11:56:25,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1083670.0, ans=0.0 2024-08-11 11:56:28,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1083670.0, ans=0.0 2024-08-11 11:56:36,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1083770.0, ans=0.2 2024-08-11 11:56:41,263 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 11:56:57,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 6950, loss[loss=0.1083, beats_loss=0.01203, ecapa_loss=0.0002161, whisper_loss=0.09415, over 20884.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01133, ecapa_loss=0.0002, whisper_loss=0.09478, over 3906496.77 frames. ], batch size: 87, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:57:00,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1083970.0, ans=0.07 2024-08-11 11:57:01,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.671e+01 2.938e+01 3.749e+01 5.482e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-11 11:57:22,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1084070.0, ans=15.0 2024-08-11 11:57:26,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1084170.0, ans=0.025 2024-08-11 11:57:30,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1084170.0, ans=0.125 2024-08-11 11:57:32,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-08-11 11:57:32,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1084170.0, ans=0.0 2024-08-11 11:57:34,864 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 11:57:38,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1084270.0, ans=0.0 2024-08-11 11:57:47,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1084270.0, ans=0.0 2024-08-11 11:57:51,270 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 11:58:03,316 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 11:58:04,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7000, loss[loss=0.09881, beats_loss=0.01298, ecapa_loss=0.0002559, whisper_loss=0.08327, over 16049.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01134, ecapa_loss=0.0002007, whisper_loss=0.09465, over 3907001.61 frames. ], batch size: 72, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:58:18,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084570.0, ans=0.1 2024-08-11 11:58:24,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1084570.0, ans=0.0 2024-08-11 11:58:28,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1084570.0, ans=0.2 2024-08-11 11:58:29,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084570.0, ans=0.1 2024-08-11 11:58:36,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-08-11 11:58:44,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084770.0, ans=0.1 2024-08-11 11:58:47,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1084770.0, ans=0.125 2024-08-11 11:58:52,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1084770.0, ans=0.125 2024-08-11 11:59:11,984 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7050, loss[loss=0.1027, beats_loss=0.01144, ecapa_loss=0.0001942, whisper_loss=0.08932, over 22450.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01136, ecapa_loss=0.0002, whisper_loss=0.09409, over 3925362.56 frames. ], batch size: 91, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:59:15,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.647e+01 2.921e+01 3.539e+01 5.654e+01, threshold=5.842e+01, percent-clipped=0.0 2024-08-11 11:59:18,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1084970.0, ans=0.0 2024-08-11 11:59:22,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1084970.0, ans=0.1 2024-08-11 11:59:33,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-11 11:59:43,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1085170.0, ans=0.125 2024-08-11 12:00:14,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1085370.0, ans=0.0 2024-08-11 12:00:15,390 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-11 12:00:19,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7100, loss[loss=0.09942, beats_loss=0.008474, ecapa_loss=0.0002476, whisper_loss=0.08847, over 16092.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01127, ecapa_loss=0.0001988, whisper_loss=0.09432, over 3925035.66 frames. ], batch size: 64, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:00:21,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2024-08-11 12:00:32,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2024-08-11 12:00:37,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1085570.0, ans=0.0 2024-08-11 12:00:40,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1085570.0, ans=10.0 2024-08-11 12:00:47,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1085670.0, ans=0.125 2024-08-11 12:01:00,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1085770.0, ans=0.0 2024-08-11 12:01:17,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1085870.0, ans=0.09899494936611666 2024-08-11 12:01:18,908 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 18 from Vox, 55 fro AS 2024-08-11 12:01:24,470 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 12:01:25,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7150, loss[loss=0.1146, beats_loss=0.01121, ecapa_loss=0.0001983, whisper_loss=0.1014, over 22522.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01138, ecapa_loss=0.0001967, whisper_loss=0.09394, over 3958376.03 frames. ], batch size: 91, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:01:29,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1085970.0, ans=0.0 2024-08-11 12:01:29,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.825e+01 3.133e+01 3.530e+01 6.975e+01, threshold=6.267e+01, percent-clipped=1.0 2024-08-11 12:01:35,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1085970.0, ans=0.125 2024-08-11 12:01:44,130 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 12:01:44,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-11 12:01:45,478 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 12:01:50,682 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 12:01:51,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2024-08-11 12:01:56,015 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-11 12:02:03,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1086170.0, ans=0.125 2024-08-11 12:02:16,449 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 12:02:18,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-08-11 12:02:30,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1086370.0, ans=0.1 2024-08-11 12:02:32,450 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7200, loss[loss=0.1093, beats_loss=0.01198, ecapa_loss=0.0002823, whisper_loss=0.09449, over 20443.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0001981, whisper_loss=0.09376, over 3954232.40 frames. ], batch size: 88, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:03:05,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1086670.0, ans=0.0 2024-08-11 12:03:26,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1086870.0, ans=0.125 2024-08-11 12:03:29,933 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 18 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 12:03:35,149 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-11 12:03:40,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7250, loss[loss=0.113, beats_loss=0.01159, ecapa_loss=0.0001405, whisper_loss=0.09998, over 21158.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01143, ecapa_loss=0.0001981, whisper_loss=0.0934, over 3938848.93 frames. ], batch size: 79, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:03:44,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.767e+01 3.129e+01 3.597e+01 6.037e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 12:03:47,126 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 12:03:50,054 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 12:03:51,268 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 12:04:03,008 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 17 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-11 12:04:07,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1087170.0, ans=0.0 2024-08-11 12:04:43,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1087370.0, ans=0.0 2024-08-11 12:04:45,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-11 12:04:47,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7300, loss[loss=0.1104, beats_loss=0.01167, ecapa_loss=0.0001726, whisper_loss=0.09702, over 22833.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01148, ecapa_loss=0.0001972, whisper_loss=0.09283, over 3921058.06 frames. ], batch size: 89, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:04:49,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1087470.0, ans=15.0 2024-08-11 12:05:16,059 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 12:05:21,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1087670.0, ans=0.0 2024-08-11 12:05:21,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1087670.0, ans=0.125 2024-08-11 12:05:23,348 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 12:05:26,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1087670.0, ans=0.2 2024-08-11 12:05:30,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1087770.0, ans=0.0 2024-08-11 12:05:32,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1087770.0, ans=0.125 2024-08-11 12:05:41,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1087870.0, ans=0.5 2024-08-11 12:05:51,815 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 12:05:55,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7350, loss[loss=0.09784, beats_loss=0.01139, ecapa_loss=0.0001795, whisper_loss=0.08465, over 22741.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01138, ecapa_loss=0.0001979, whisper_loss=0.09309, over 3899702.08 frames. ], batch size: 92, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:05:59,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.629e+01 2.975e+01 3.413e+01 5.829e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 12:06:02,675 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 12:06:14,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1088070.0, ans=0.125 2024-08-11 12:06:25,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2024-08-11 12:06:53,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1088370.0, ans=0.125 2024-08-11 12:06:56,020 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 12:06:59,720 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 12:07:03,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7400, loss[loss=0.09673, beats_loss=0.0108, ecapa_loss=0.0002926, whisper_loss=0.083, over 20511.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01134, ecapa_loss=0.0001991, whisper_loss=0.09292, over 3906462.80 frames. ], batch size: 92, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:07:11,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1088470.0, ans=0.125 2024-08-11 12:07:20,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2024-08-11 12:07:21,019 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 12:07:26,204 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 12:07:29,194 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:07:38,084 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 12:07:45,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1088770.0, ans=0.0 2024-08-11 12:07:50,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1088770.0, ans=0.1 2024-08-11 12:08:02,501 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 12:08:10,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7450, loss[loss=0.08242, beats_loss=0.01214, ecapa_loss=0.000176, whisper_loss=0.06852, over 14755.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01142, ecapa_loss=0.0001989, whisper_loss=0.0931, over 3918540.06 frames. ], batch size: 58, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:08:14,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.716e+01 3.101e+01 3.669e+01 6.917e+01, threshold=6.202e+01, percent-clipped=1.0 2024-08-11 12:08:44,526 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-11 12:08:59,717 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 12:09:02,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1089270.0, ans=0.125 2024-08-11 12:09:05,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-08-11 12:09:09,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1089370.0, ans=0.125 2024-08-11 12:09:17,337 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 12:09:17,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1089370.0, ans=0.1 2024-08-11 12:09:21,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7500, loss[loss=0.1164, beats_loss=0.01127, ecapa_loss=0.0001979, whisper_loss=0.1031, over 22293.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01139, ecapa_loss=0.000201, whisper_loss=0.09333, over 3911306.23 frames. ], batch size: 90, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:09:24,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1089470.0, ans=0.125 2024-08-11 12:09:32,616 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 12:09:43,671 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 12:09:47,023 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 12:10:00,819 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 12:10:15,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1089770.0, ans=0.0 2024-08-11 12:10:18,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.52 vs. limit=5.0 2024-08-11 12:10:20,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1089870.0, ans=0.0 2024-08-11 12:10:24,191 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 12:10:29,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1089870.0, ans=0.1 2024-08-11 12:10:32,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7550, loss[loss=0.09982, beats_loss=0.01188, ecapa_loss=0.0002216, whisper_loss=0.08572, over 18847.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01141, ecapa_loss=0.0002006, whisper_loss=0.09307, over 3907917.96 frames. ], batch size: 78, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:10:36,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.654e+01 2.939e+01 3.334e+01 5.450e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-11 12:10:41,425 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 12:10:43,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1089970.0, ans=0.125 2024-08-11 12:10:44,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.67 vs. limit=10.0 2024-08-11 12:10:48,016 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.852e-01 2024-08-11 12:10:51,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1090070.0, ans=0.1 2024-08-11 12:11:15,689 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 12:11:30,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1090370.0, ans=0.0 2024-08-11 12:11:39,713 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-11 12:11:44,026 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7600, loss[loss=0.08478, beats_loss=0.01467, ecapa_loss=0.0001686, whisper_loss=0.06842, over 14301.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0114, ecapa_loss=0.0001993, whisper_loss=0.09309, over 3865177.27 frames. ], batch size: 57, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:12:08,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.77 vs. limit=10.0 2024-08-11 12:12:10,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1090670.0, ans=0.0 2024-08-11 12:12:14,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1090670.0, ans=0.04949747468305833 2024-08-11 12:12:44,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1090870.0, ans=0.05 2024-08-11 12:12:52,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7650, loss[loss=0.1197, beats_loss=0.0119, ecapa_loss=0.0001795, whisper_loss=0.106, over 18787.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0113, ecapa_loss=0.0002002, whisper_loss=0.09307, over 3880909.93 frames. ], batch size: 73, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:12:56,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.822e+01 3.132e+01 3.571e+01 5.523e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 12:13:03,354 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 12:13:22,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1091170.0, ans=0.125 2024-08-11 12:13:33,049 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 12:13:43,826 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-11 12:13:48,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1091370.0, ans=0.2 2024-08-11 12:13:54,461 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 12:13:55,888 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 12:13:59,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7700, loss[loss=0.09784, beats_loss=0.01191, ecapa_loss=0.0001464, whisper_loss=0.08447, over 17080.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01132, ecapa_loss=0.0002001, whisper_loss=0.09277, over 3879711.76 frames. ], batch size: 66, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:14:06,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1091470.0, ans=0.125 2024-08-11 12:14:10,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1091470.0, ans=0.2 2024-08-11 12:14:30,368 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 12:14:33,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1091670.0, ans=0.125 2024-08-11 12:15:05,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7750, loss[loss=0.1035, beats_loss=0.01262, ecapa_loss=0.0002211, whisper_loss=0.08864, over 16191.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01135, ecapa_loss=0.0001983, whisper_loss=0.09239, over 3865009.00 frames. ], batch size: 67, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:15:10,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.756e+01 3.140e+01 3.838e+01 1.235e+02, threshold=6.279e+01, percent-clipped=2.0 2024-08-11 12:15:28,083 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 12:15:32,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1092170.0, ans=15.0 2024-08-11 12:15:41,327 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 12:15:47,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1092270.0, ans=0.0 2024-08-11 12:15:49,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1092270.0, ans=0.05 2024-08-11 12:15:53,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1092270.0, ans=0.125 2024-08-11 12:16:00,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1092370.0, ans=0.04949747468305833 2024-08-11 12:16:01,870 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 12:16:05,562 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 31 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 12:16:07,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1092370.0, ans=0.125 2024-08-11 12:16:10,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7800, loss[loss=0.09693, beats_loss=0.009554, ecapa_loss=0.00022, whisper_loss=0.08518, over 13740.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.000198, whisper_loss=0.09299, over 3858511.55 frames. ], batch size: 56, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:16:15,204 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 12:16:19,278 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 12:16:21,959 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 12:16:27,257 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 29 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-11 12:17:07,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1092870.0, ans=0.125 2024-08-11 12:17:17,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7850, loss[loss=0.1001, beats_loss=0.01305, ecapa_loss=0.0001642, whisper_loss=0.08538, over 21809.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01126, ecapa_loss=0.0001986, whisper_loss=0.09361, over 3869568.12 frames. ], batch size: 85, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:17:21,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.734e+01 3.036e+01 3.446e+01 5.621e+01, threshold=6.073e+01, percent-clipped=0.0 2024-08-11 12:17:21,738 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 12:17:32,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2024-08-11 12:17:38,019 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 12:17:54,861 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 12:17:56,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1093270.0, ans=0.0 2024-08-11 12:18:11,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1093370.0, ans=0.2 2024-08-11 12:18:24,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7900, loss[loss=0.1005, beats_loss=0.0092, ecapa_loss=0.0002181, whisper_loss=0.08916, over 18125.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001985, whisper_loss=0.09312, over 3871980.70 frames. ], batch size: 69, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:18:28,796 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 12:18:35,288 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 12:18:35,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1093470.0, ans=0.1 2024-08-11 12:18:43,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1093570.0, ans=0.125 2024-08-11 12:19:00,419 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 12:19:00,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1093670.0, ans=0.125 2024-08-11 12:19:01,634 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 12:19:16,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1093870.0, ans=0.125 2024-08-11 12:19:30,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 7950, loss[loss=0.09722, beats_loss=0.01316, ecapa_loss=0.0001543, whisper_loss=0.08251, over 18166.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01145, ecapa_loss=0.0001962, whisper_loss=0.09289, over 3884907.10 frames. ], batch size: 71, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:19:34,965 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.750e+01 3.082e+01 3.483e+01 5.642e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-11 12:20:10,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1094270.0, ans=0.1 2024-08-11 12:20:14,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1094270.0, ans=0.0 2024-08-11 12:20:37,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8000, loss[loss=0.1087, beats_loss=0.008391, ecapa_loss=0.0002394, whisper_loss=0.09791, over 21355.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01148, ecapa_loss=0.0001954, whisper_loss=0.09311, over 3899097.46 frames. ], batch size: 87, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:20:37,779 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 12:20:46,902 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-11 12:21:09,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1094670.0, ans=0.125 2024-08-11 12:21:17,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1094770.0, ans=0.0 2024-08-11 12:21:32,869 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 12:21:33,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1094870.0, ans=0.125 2024-08-11 12:21:42,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1094870.0, ans=0.125 2024-08-11 12:21:44,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8050, loss[loss=0.0899, beats_loss=0.01213, ecapa_loss=0.0001937, whisper_loss=0.07583, over 18884.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0001956, whisper_loss=0.09369, over 3867494.66 frames. ], batch size: 79, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:21:46,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-11 12:21:48,654 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.683e+01 3.112e+01 3.562e+01 5.362e+01, threshold=6.224e+01, percent-clipped=0.0 2024-08-11 12:21:54,203 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 12:21:54,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1094970.0, ans=0.125 2024-08-11 12:22:24,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1095270.0, ans=0.07 2024-08-11 12:22:38,205 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.725e-01 2024-08-11 12:22:47,061 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 12:22:52,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8100, loss[loss=0.1213, beats_loss=0.009938, ecapa_loss=0.0001795, whisper_loss=0.1095, over 23500.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01128, ecapa_loss=0.0001963, whisper_loss=0.09451, over 3885871.26 frames. ], batch size: 90, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:22:54,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1095470.0, ans=0.125 2024-08-11 12:22:55,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1095470.0, ans=0.0 2024-08-11 12:22:57,638 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 12:22:59,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=1095470.0, ans=22.5 2024-08-11 12:23:07,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2024-08-11 12:23:08,141 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 12:23:12,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1095570.0, ans=0.125 2024-08-11 12:23:14,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1095570.0, ans=0.0 2024-08-11 12:23:29,422 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 12:23:43,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1095770.0, ans=0.2 2024-08-11 12:23:49,872 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 12:23:50,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1095870.0, ans=0.0 2024-08-11 12:23:58,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8150, loss[loss=0.1191, beats_loss=0.01116, ecapa_loss=0.0002359, whisper_loss=0.1056, over 21428.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0113, ecapa_loss=0.0001981, whisper_loss=0.09373, over 3900604.18 frames. ], batch size: 92, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:24:03,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.662e+01 2.951e+01 3.382e+01 5.794e+01, threshold=5.903e+01, percent-clipped=0.0 2024-08-11 12:24:09,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1095970.0, ans=0.2 2024-08-11 12:24:13,988 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 12:24:43,830 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-11 12:25:01,267 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 27 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-11 12:25:06,405 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8200, loss[loss=0.1115, beats_loss=0.009481, ecapa_loss=0.0001842, whisper_loss=0.1001, over 18907.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01136, ecapa_loss=0.0001983, whisper_loss=0.09318, over 3900730.58 frames. ], batch size: 72, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:25:10,539 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 12:25:13,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1096470.0, ans=0.125 2024-08-11 12:25:24,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-11 12:25:25,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-11 12:25:32,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1096670.0, ans=0.2 2024-08-11 12:25:34,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1096670.0, ans=0.125 2024-08-11 12:25:38,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1096670.0, ans=0.125 2024-08-11 12:25:46,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1096770.0, ans=0.0 2024-08-11 12:25:48,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1096770.0, ans=0.125 2024-08-11 12:25:52,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1096770.0, ans=0.1 2024-08-11 12:26:07,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2024-08-11 12:26:12,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8250, loss[loss=0.09707, beats_loss=0.0131, ecapa_loss=0.0001548, whisper_loss=0.08242, over 24086.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001987, whisper_loss=0.09317, over 3870207.95 frames. ], batch size: 94, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:26:16,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.782e+01 3.103e+01 3.474e+01 6.879e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 12:26:29,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1097070.0, ans=0.125 2024-08-11 12:26:39,865 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 12:26:43,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1097170.0, ans=0.125 2024-08-11 12:26:45,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1097170.0, ans=0.125 2024-08-11 12:26:49,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1097170.0, ans=0.125 2024-08-11 12:27:04,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1097270.0, ans=0.1 2024-08-11 12:27:10,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1097370.0, ans=0.125 2024-08-11 12:27:15,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1097370.0, ans=0.0 2024-08-11 12:27:19,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8300, loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0001994, whisper_loss=0.09192, over 17839.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01128, ecapa_loss=0.0002, whisper_loss=0.09339, over 3892457.04 frames. ], batch size: 68, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:27:28,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-08-11 12:27:38,015 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 12:27:51,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1097670.0, ans=0.125 2024-08-11 12:27:53,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1097670.0, ans=0.125 2024-08-11 12:28:04,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1097770.0, ans=0.125 2024-08-11 12:28:06,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1097770.0, ans=0.2 2024-08-11 12:28:26,306 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8350, loss[loss=0.1064, beats_loss=0.0107, ecapa_loss=0.0002611, whisper_loss=0.09311, over 17589.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01128, ecapa_loss=0.0002, whisper_loss=0.0933, over 3865858.77 frames. ], batch size: 73, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:28:26,451 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 12:28:30,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.714e+01 3.261e+01 3.683e+01 6.544e+01, threshold=6.523e+01, percent-clipped=1.0 2024-08-11 12:28:58,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=12.0 2024-08-11 12:29:10,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2024-08-11 12:29:11,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1098270.0, ans=0.1 2024-08-11 12:29:16,445 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 12:29:20,585 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 12:29:26,243 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:29:34,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8400, loss[loss=0.1115, beats_loss=0.01119, ecapa_loss=0.0002167, whisper_loss=0.09815, over 13119.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01133, ecapa_loss=0.0001989, whisper_loss=0.09341, over 3865063.19 frames. ], batch size: 53, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:29:40,658 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 12:29:48,455 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 12:29:49,740 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-11 12:29:58,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1098570.0, ans=0.125 2024-08-11 12:30:08,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.48 vs. limit=10.0 2024-08-11 12:30:27,781 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 12:30:32,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1098870.0, ans=0.0 2024-08-11 12:30:34,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1098870.0, ans=0.2 2024-08-11 12:30:40,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8450, loss[loss=0.121, beats_loss=0.01059, ecapa_loss=0.0001875, whisper_loss=0.1085, over 16453.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01122, ecapa_loss=0.0001998, whisper_loss=0.09408, over 3843212.47 frames. ], batch size: 64, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:30:44,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1098970.0, ans=0.0 2024-08-11 12:30:44,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.709e+01 3.054e+01 3.505e+01 4.740e+01, threshold=6.108e+01, percent-clipped=0.0 2024-08-11 12:30:50,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1098970.0, ans=0.0 2024-08-11 12:30:58,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1099070.0, ans=0.125 2024-08-11 12:31:00,038 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 12:31:00,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1099070.0, ans=0.125 2024-08-11 12:31:11,331 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 12:31:19,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1099270.0, ans=0.0 2024-08-11 12:31:27,396 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 12:31:40,510 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 21 from Vox, 12 fro AS 2024-08-11 12:31:46,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8500, loss[loss=0.0936, beats_loss=0.01221, ecapa_loss=0.0002087, whisper_loss=0.07931, over 14717.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0113, ecapa_loss=0.0002005, whisper_loss=0.09264, over 3831073.78 frames. ], batch size: 63, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:31:48,393 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 12:31:50,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.27 vs. limit=22.5 2024-08-11 12:32:10,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1099570.0, ans=0.1 2024-08-11 12:32:33,889 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 12:32:39,295 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 12:32:48,747 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 25 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-11 12:32:56,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8550, loss[loss=0.1418, beats_loss=0.008672, ecapa_loss=0.0001806, whisper_loss=0.1313, over 20025.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01132, ecapa_loss=0.0001994, whisper_loss=0.09339, over 3862503.55 frames. ], batch size: 76, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:33:00,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.728e+01 3.009e+01 3.613e+01 5.860e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 12:33:39,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-11 12:33:41,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1100270.0, ans=0.2 2024-08-11 12:33:44,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1100270.0, ans=0.125 2024-08-11 12:33:47,860 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 12:34:02,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-08-11 12:34:10,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1100470.0, ans=0.0 2024-08-11 12:34:11,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8600, loss[loss=0.09143, beats_loss=0.01274, ecapa_loss=0.0001651, whisper_loss=0.07704, over 21799.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0114, ecapa_loss=0.0001971, whisper_loss=0.09344, over 3891284.88 frames. ], batch size: 88, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:34:12,045 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 12:34:27,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1100570.0, ans=0.0 2024-08-11 12:34:30,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1100570.0, ans=0.125 2024-08-11 12:34:34,108 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 12:34:39,337 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 12:35:14,865 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 18 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 12:35:21,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1100870.0, ans=0.1 2024-08-11 12:35:26,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8650, loss[loss=0.06466, beats_loss=0.01482, ecapa_loss=0.0002505, whisper_loss=0.04733, over 16211.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01143, ecapa_loss=0.0001985, whisper_loss=0.09237, over 3863269.69 frames. ], batch size: 71, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:35:30,135 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 12:35:31,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.631e+01 2.958e+01 3.559e+01 6.258e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-11 12:35:40,257 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 12:35:44,507 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 12:35:53,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1101070.0, ans=0.0 2024-08-11 12:36:14,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1101270.0, ans=0.07 2024-08-11 12:36:18,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-11 12:36:19,062 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 12:36:22,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1101270.0, ans=0.0 2024-08-11 12:36:22,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1101270.0, ans=0.125 2024-08-11 12:36:24,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1101270.0, ans=0.125 2024-08-11 12:36:25,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1101270.0, ans=0.0 2024-08-11 12:36:33,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2024-08-11 12:36:41,431 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 12:36:47,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8700, loss[loss=0.1141, beats_loss=0.01039, ecapa_loss=0.0002071, whisper_loss=0.1016, over 23168.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01141, ecapa_loss=0.0001985, whisper_loss=0.09223, over 3863592.52 frames. ], batch size: 92, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:37:21,395 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 12:37:38,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1101770.0, ans=0.125 2024-08-11 12:37:46,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1101770.0, ans=0.1 2024-08-11 12:37:46,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=12.0 2024-08-11 12:37:50,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1101770.0, ans=0.125 2024-08-11 12:37:53,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-11 12:38:11,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8750, loss[loss=0.1194, beats_loss=0.01193, ecapa_loss=0.0001401, whisper_loss=0.1061, over 17195.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01128, ecapa_loss=0.0002001, whisper_loss=0.0935, over 3860605.51 frames. ], batch size: 62, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:38:12,764 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 12:38:14,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-11 12:38:15,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.807e+01 3.199e+01 3.848e+01 5.840e+01, threshold=6.398e+01, percent-clipped=0.0 2024-08-11 12:38:36,206 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 12:38:43,576 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 12:38:51,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1102170.0, ans=0.0 2024-08-11 12:38:58,056 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 12:38:58,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1102270.0, ans=0.0 2024-08-11 12:39:01,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1102270.0, ans=0.125 2024-08-11 12:39:25,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8800, loss[loss=0.1081, beats_loss=0.008674, ecapa_loss=0.0002726, whisper_loss=0.09674, over 17729.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.0002009, whisper_loss=0.09304, over 3877071.99 frames. ], batch size: 76, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:39:28,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102470.0, ans=0.1 2024-08-11 12:39:58,450 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 12:40:23,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1102770.0, ans=0.025 2024-08-11 12:40:26,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1102770.0, ans=0.125 2024-08-11 12:40:40,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1102870.0, ans=0.0 2024-08-11 12:40:41,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1102870.0, ans=0.125 2024-08-11 12:40:44,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8850, loss[loss=0.09557, beats_loss=0.01433, ecapa_loss=0.0001587, whisper_loss=0.07966, over 22519.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01137, ecapa_loss=0.0002002, whisper_loss=0.09275, over 3870191.06 frames. ], batch size: 91, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:40:48,187 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.779e+01 3.220e+01 3.967e+01 6.531e+01, threshold=6.439e+01, percent-clipped=1.0 2024-08-11 12:41:03,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1103070.0, ans=0.125 2024-08-11 12:41:09,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1103070.0, ans=0.1 2024-08-11 12:41:41,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1103270.0, ans=0.0 2024-08-11 12:42:00,917 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 12:42:03,798 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 12:42:06,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8900, loss[loss=0.1267, beats_loss=0.01043, ecapa_loss=0.0001905, whisper_loss=0.1144, over 16541.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01136, ecapa_loss=0.0001999, whisper_loss=0.0932, over 3883457.83 frames. ], batch size: 62, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:42:10,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1103470.0, ans=0.1 2024-08-11 12:42:21,846 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 12:42:30,094 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 12:42:33,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1103570.0, ans=0.125 2024-08-11 12:42:36,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1103570.0, ans=0.125 2024-08-11 12:42:36,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1103570.0, ans=0.125 2024-08-11 12:42:50,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1103670.0, ans=0.125 2024-08-11 12:42:59,165 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 12:43:02,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1103770.0, ans=0.0 2024-08-11 12:43:18,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1103870.0, ans=0.125 2024-08-11 12:43:24,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 8950, loss[loss=0.09878, beats_loss=0.0117, ecapa_loss=0.0001861, whisper_loss=0.08522, over 19689.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01143, ecapa_loss=0.0002007, whisper_loss=0.09292, over 3886510.78 frames. ], batch size: 79, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:43:27,759 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-11 12:43:28,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.747e+01 3.145e+01 3.619e+01 5.572e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 12:43:30,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1103970.0, ans=0.0 2024-08-11 12:43:32,412 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 12:43:41,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1104070.0, ans=0.2 2024-08-11 12:43:48,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1104070.0, ans=0.125 2024-08-11 12:43:53,535 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 12:44:16,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1104270.0, ans=0.0 2024-08-11 12:44:16,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1104270.0, ans=0.2 2024-08-11 12:44:24,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1104370.0, ans=0.1 2024-08-11 12:44:25,814 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 12:44:37,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1104370.0, ans=0.125 2024-08-11 12:44:39,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9000, loss[loss=0.1215, beats_loss=0.01103, ecapa_loss=0.0001765, whisper_loss=0.1087, over 22679.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01137, ecapa_loss=0.0002009, whisper_loss=0.09306, over 3895746.02 frames. ], batch size: 87, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:44:39,431 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 12:45:15,355 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on ASR_libri: loss=0.2575, beats_loss=0, ecapa_loss=0.0006551, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 12:45:34,114 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on SV_voxceleb1: loss=0.005315, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0, over 939242.00 frames. 2024-08-11 12:47:19,788 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on AT_audioset: loss=0.02529, beats_loss=0.02529, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 12:47:19,792 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 12:47:20,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1104470.0, ans=0.0 2024-08-11 12:47:24,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1104470.0, ans=0.2 2024-08-11 12:47:37,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1104570.0, ans=0.125 2024-08-11 12:47:38,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1104570.0, ans=0.125 2024-08-11 12:47:52,556 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 12:47:57,627 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 35 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 12:48:09,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1104770.0, ans=0.125 2024-08-11 12:48:09,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1104770.0, ans=0.0 2024-08-11 12:48:15,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1104770.0, ans=0.0 2024-08-11 12:48:22,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-08-11 12:48:36,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9050, loss[loss=0.1231, beats_loss=0.01148, ecapa_loss=0.0001917, whisper_loss=0.1097, over 19894.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01127, ecapa_loss=0.0002004, whisper_loss=0.09418, over 3916537.64 frames. ], batch size: 80, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:48:37,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1104970.0, ans=0.125 2024-08-11 12:48:41,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.751e+01 3.167e+01 3.446e+01 7.186e+01, threshold=6.334e+01, percent-clipped=1.0 2024-08-11 12:48:41,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1104970.0, ans=0.0 2024-08-11 12:48:57,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-11 12:49:07,794 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 12:49:15,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1105170.0, ans=0.125 2024-08-11 12:49:43,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1105370.0, ans=0.125 2024-08-11 12:49:51,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1105370.0, ans=0.1 2024-08-11 12:49:53,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9100, loss[loss=0.104, beats_loss=0.01273, ecapa_loss=0.0001843, whisper_loss=0.08942, over 18604.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01117, ecapa_loss=0.0002012, whisper_loss=0.0948, over 3909811.06 frames. ], batch size: 75, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:50:21,615 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 12:50:26,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2024-08-11 12:50:32,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1105670.0, ans=0.09899494936611666 2024-08-11 12:50:37,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1105670.0, ans=0.125 2024-08-11 12:50:40,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1105770.0, ans=0.2 2024-08-11 12:51:01,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1105870.0, ans=0.125 2024-08-11 12:51:04,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1105870.0, ans=0.125 2024-08-11 12:51:10,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9150, loss[loss=0.1155, beats_loss=0.008687, ecapa_loss=0.0002412, whisper_loss=0.1044, over 14420.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01124, ecapa_loss=0.000201, whisper_loss=0.09384, over 3889986.37 frames. ], batch size: 56, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:51:14,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.723e+01 3.003e+01 3.393e+01 4.790e+01, threshold=6.006e+01, percent-clipped=0.0 2024-08-11 12:51:14,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1105970.0, ans=0.125 2024-08-11 12:51:21,341 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 12:51:30,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1106070.0, ans=0.125 2024-08-11 12:51:34,776 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 24 from Vox, 13 fro AS 2024-08-11 12:51:50,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2024-08-11 12:51:51,294 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 12:51:53,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1106170.0, ans=0.125 2024-08-11 12:52:02,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1106270.0, ans=0.125 2024-08-11 12:52:10,567 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.549e+02 2024-08-11 12:52:11,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1106370.0, ans=0.125 2024-08-11 12:52:25,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9200, loss[loss=0.1309, beats_loss=0.008491, ecapa_loss=0.0001868, whisper_loss=0.1205, over 23837.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01127, ecapa_loss=0.0001993, whisper_loss=0.09356, over 3914127.75 frames. ], batch size: 89, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:52:29,058 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 12:52:31,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1106470.0, ans=0.125 2024-08-11 12:52:41,085 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:52:47,548 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 12:53:13,044 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 12:53:29,463 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 12:53:32,051 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-11 12:53:42,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9250, loss[loss=0.1105, beats_loss=0.009318, ecapa_loss=0.000281, whisper_loss=0.09836, over 20012.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01126, ecapa_loss=0.0002002, whisper_loss=0.09346, over 3893699.88 frames. ], batch size: 87, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:53:47,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.771e+01 3.106e+01 3.599e+01 1.159e+02, threshold=6.212e+01, percent-clipped=1.0 2024-08-11 12:53:47,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106970.0, ans=0.1 2024-08-11 12:54:06,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-11 12:54:06,808 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 12:54:08,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1107070.0, ans=0.1 2024-08-11 12:54:09,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1107070.0, ans=0.0 2024-08-11 12:54:20,628 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 12:54:42,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-11 12:54:48,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-11 12:54:57,052 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9300, loss[loss=0.1024, beats_loss=0.01002, ecapa_loss=0.0001827, whisper_loss=0.09056, over 18425.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01131, ecapa_loss=0.0001992, whisper_loss=0.09344, over 3876992.68 frames. ], batch size: 71, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:55:12,446 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 12:55:38,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1107670.0, ans=0.125 2024-08-11 12:55:50,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1107770.0, ans=0.0 2024-08-11 12:55:54,389 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 28 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 12:55:56,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.06 vs. limit=8.0 2024-08-11 12:55:59,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-11 12:56:00,591 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 12:56:12,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9350, loss[loss=0.1102, beats_loss=0.01096, ecapa_loss=0.0002259, whisper_loss=0.097, over 19103.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01123, ecapa_loss=0.0002008, whisper_loss=0.09351, over 3872611.01 frames. ], batch size: 79, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:56:17,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.772e+01 2.988e+01 3.438e+01 1.215e+02, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 12:56:24,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1107970.0, ans=0.125 2024-08-11 12:56:37,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1108070.0, ans=0.0 2024-08-11 12:56:41,043 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 12:56:52,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1108170.0, ans=0.125 2024-08-11 12:57:02,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1108270.0, ans=0.125 2024-08-11 12:57:14,801 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 12:57:28,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9400, loss[loss=0.1053, beats_loss=0.009565, ecapa_loss=0.0002295, whisper_loss=0.09339, over 21645.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01127, ecapa_loss=0.0002007, whisper_loss=0.09318, over 3857397.97 frames. ], batch size: 90, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:57:43,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1108570.0, ans=0.2 2024-08-11 12:57:47,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1108570.0, ans=0.125 2024-08-11 12:58:07,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1108670.0, ans=0.125 2024-08-11 12:58:18,372 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 12:58:45,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9450, loss[loss=0.1112, beats_loss=0.01281, ecapa_loss=0.0001657, whisper_loss=0.09673, over 19805.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01133, ecapa_loss=0.0001987, whisper_loss=0.0929, over 3862378.43 frames. ], batch size: 77, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:58:49,340 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 12:58:50,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.669e+01 3.064e+01 3.549e+01 5.554e+01, threshold=6.127e+01, percent-clipped=0.0 2024-08-11 12:58:50,608 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 12:59:02,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-11 12:59:27,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-08-11 12:59:29,490 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 12:59:33,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1109270.0, ans=0.0 2024-08-11 12:59:36,115 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 41 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 12:59:38,673 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 12:59:41,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1109270.0, ans=0.125 2024-08-11 12:59:51,445 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-11 12:59:54,699 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 13:00:00,579 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9500, loss[loss=0.09755, beats_loss=0.009602, ecapa_loss=0.0001609, whisper_loss=0.08634, over 15168.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001987, whisper_loss=0.09264, over 3887162.04 frames. ], batch size: 55, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:00:34,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1109670.0, ans=0.0 2024-08-11 13:00:37,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2024-08-11 13:00:48,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1109770.0, ans=0.125 2024-08-11 13:01:02,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1109870.0, ans=0.2 2024-08-11 13:01:09,359 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 13:01:11,111 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 13:01:13,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9550, loss[loss=0.1039, beats_loss=0.0112, ecapa_loss=0.0001874, whisper_loss=0.09084, over 16977.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01142, ecapa_loss=0.0001991, whisper_loss=0.09208, over 3871164.69 frames. ], batch size: 67, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:01:15,827 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 13:01:18,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.581e+01 3.097e+01 3.550e+01 5.814e+01, threshold=6.195e+01, percent-clipped=0.0 2024-08-11 13:01:33,639 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 13:01:34,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1110070.0, ans=0.0 2024-08-11 13:01:34,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1110070.0, ans=0.0 2024-08-11 13:01:46,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1110170.0, ans=0.125 2024-08-11 13:01:48,077 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 13:01:49,630 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 13:02:12,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110270.0, ans=0.1 2024-08-11 13:02:20,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1110370.0, ans=0.09899494936611666 2024-08-11 13:02:28,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9600, loss[loss=0.1053, beats_loss=0.01078, ecapa_loss=0.0002141, whisper_loss=0.09234, over 20011.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01133, ecapa_loss=0.0001998, whisper_loss=0.09271, over 3883092.87 frames. ], batch size: 81, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:02:31,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.97 vs. limit=22.5 2024-08-11 13:02:42,346 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 13:02:53,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1110570.0, ans=0.125 2024-08-11 13:02:57,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110670.0, ans=0.1 2024-08-11 13:03:05,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1110670.0, ans=0.0 2024-08-11 13:03:21,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1110770.0, ans=0.1 2024-08-11 13:03:28,289 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 13:03:39,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9650, loss[loss=0.08879, beats_loss=0.01194, ecapa_loss=0.0001648, whisper_loss=0.0752, over 14299.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01124, ecapa_loss=0.0002008, whisper_loss=0.09314, over 3854931.85 frames. ], batch size: 55, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:03:43,493 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.716e+01 3.002e+01 3.574e+01 5.577e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 13:03:56,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1111070.0, ans=0.125 2024-08-11 13:03:57,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2024-08-11 13:04:11,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1111170.0, ans=0.125 2024-08-11 13:04:14,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1111170.0, ans=0.0 2024-08-11 13:04:31,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-11 13:04:47,650 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 13:04:50,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9700, loss[loss=0.09356, beats_loss=0.01287, ecapa_loss=0.0001578, whisper_loss=0.07911, over 23569.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01123, ecapa_loss=0.000202, whisper_loss=0.09307, over 3839903.34 frames. ], batch size: 93, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:05:01,144 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 13:05:08,150 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 13:05:26,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.76 vs. limit=10.0 2024-08-11 13:05:45,593 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 13:06:03,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9750, loss[loss=0.1155, beats_loss=0.01097, ecapa_loss=0.0001936, whisper_loss=0.1026, over 23182.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01126, ecapa_loss=0.0001996, whisper_loss=0.09326, over 3831394.43 frames. ], batch size: 92, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:06:08,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.596e+01 2.916e+01 3.374e+01 5.743e+01, threshold=5.832e+01, percent-clipped=0.0 2024-08-11 13:06:26,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1112070.0, ans=0.1 2024-08-11 13:06:32,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1112170.0, ans=0.125 2024-08-11 13:06:34,535 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:06:57,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1112270.0, ans=0.2 2024-08-11 13:07:17,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9800, loss[loss=0.09375, beats_loss=0.01297, ecapa_loss=0.0001866, whisper_loss=0.07891, over 15312.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01133, ecapa_loss=0.0001988, whisper_loss=0.09262, over 3839929.49 frames. ], batch size: 62, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:07:23,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1112470.0, ans=0.1 2024-08-11 13:07:25,529 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 13:07:34,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-08-11 13:07:36,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2024-08-11 13:07:43,410 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-11 13:07:44,614 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 13:07:50,063 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 13:08:02,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1112770.0, ans=0.0 2024-08-11 13:08:28,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1112870.0, ans=0.125 2024-08-11 13:08:30,163 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 13:08:32,138 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.034e-01 2024-08-11 13:08:32,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9850, loss[loss=0.09548, beats_loss=0.01601, ecapa_loss=0.0001357, whisper_loss=0.07811, over 15010.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0001995, whisper_loss=0.09319, over 3815548.50 frames. ], batch size: 57, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:08:37,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.640e+01 2.920e+01 3.284e+01 5.372e+01, threshold=5.839e+01, percent-clipped=0.0 2024-08-11 13:08:43,804 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-11 13:08:48,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1112970.0, ans=0.125 2024-08-11 13:09:02,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=22.5 2024-08-11 13:09:16,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1113170.0, ans=0.2 2024-08-11 13:09:22,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2024-08-11 13:09:31,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1113270.0, ans=0.0 2024-08-11 13:09:33,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1113370.0, ans=0.0 2024-08-11 13:09:38,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.56 vs. limit=22.5 2024-08-11 13:09:41,364 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 13:09:50,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9900, loss[loss=0.1107, beats_loss=0.01276, ecapa_loss=0.0002656, whisper_loss=0.09526, over 19804.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01139, ecapa_loss=0.000199, whisper_loss=0.09294, over 3866046.63 frames. ], batch size: 84, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:09:56,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1113470.0, ans=0.125 2024-08-11 13:10:10,850 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-11 13:10:11,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-11 13:10:25,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-11 13:10:30,982 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 13:10:34,947 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 13:10:39,008 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.581e-02 2024-08-11 13:10:41,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-08-11 13:10:52,247 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 13:11:18,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 9950, loss[loss=0.1217, beats_loss=0.009665, ecapa_loss=0.0002444, whisper_loss=0.1096, over 17240.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01139, ecapa_loss=0.0001989, whisper_loss=0.09303, over 3852926.66 frames. ], batch size: 68, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:11:24,431 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.684e+01 2.921e+01 3.407e+01 1.322e+02, threshold=5.842e+01, percent-clipped=4.0 2024-08-11 13:11:29,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1113970.0, ans=0.0 2024-08-11 13:11:36,638 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 13:12:06,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1114170.0, ans=0.0 2024-08-11 13:12:09,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1114170.0, ans=0.2 2024-08-11 13:12:15,712 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 13:12:17,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=12.0 2024-08-11 13:12:27,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1114270.0, ans=0.125 2024-08-11 13:12:50,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10000, loss[loss=0.08452, beats_loss=0.01435, ecapa_loss=0.0002406, whisper_loss=0.06777, over 16588.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.0002, whisper_loss=0.09358, over 3863457.60 frames. ], batch size: 71, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:12:52,429 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 13:12:54,827 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 13:13:06,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1114470.0, ans=0.09899494936611666 2024-08-11 13:13:17,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1114570.0, ans=0.0 2024-08-11 13:13:20,986 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 13:13:32,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-11 13:13:46,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1114770.0, ans=0.025 2024-08-11 13:13:51,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1114770.0, ans=0.125 2024-08-11 13:14:17,619 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-11 13:14:20,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10050, loss[loss=0.09633, beats_loss=0.009863, ecapa_loss=0.000257, whisper_loss=0.0839, over 18228.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01138, ecapa_loss=0.0001996, whisper_loss=0.09268, over 3864045.62 frames. ], batch size: 77, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:14:26,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.703e+01 2.998e+01 3.429e+01 6.033e+01, threshold=5.996e+01, percent-clipped=1.0 2024-08-11 13:14:58,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-08-11 13:14:59,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1115170.0, ans=0.125 2024-08-11 13:15:23,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2024-08-11 13:15:27,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-11 13:15:38,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1115370.0, ans=0.1 2024-08-11 13:15:57,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10100, loss[loss=0.1019, beats_loss=0.01349, ecapa_loss=0.0002182, whisper_loss=0.08622, over 21414.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01144, ecapa_loss=0.0001994, whisper_loss=0.09251, over 3878018.05 frames. ], batch size: 89, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:16:02,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1115470.0, ans=0.125 2024-08-11 13:16:06,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1115470.0, ans=0.125 2024-08-11 13:16:13,575 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 13:16:23,851 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 13:16:30,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1115570.0, ans=0.125 2024-08-11 13:16:34,563 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 13:16:37,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1115570.0, ans=0.125 2024-08-11 13:17:02,179 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 13:17:02,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1115670.0, ans=0.95 2024-08-11 13:17:25,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1115770.0, ans=0.0 2024-08-11 13:17:45,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10150, loss[loss=0.1174, beats_loss=0.009328, ecapa_loss=0.0001892, whisper_loss=0.1062, over 20999.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01137, ecapa_loss=0.0002006, whisper_loss=0.09299, over 3884287.80 frames. ], batch size: 80, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:17:46,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1115970.0, ans=0.125 2024-08-11 13:17:49,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.757e+01 3.072e+01 3.612e+01 1.119e+02, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:18:22,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1116170.0, ans=0.125 2024-08-11 13:18:27,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1116170.0, ans=0.125 2024-08-11 13:18:42,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.42 vs. limit=12.0 2024-08-11 13:18:42,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1116270.0, ans=0.2 2024-08-11 13:18:47,992 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 13:18:48,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1116370.0, ans=0.125 2024-08-11 13:18:51,564 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 29 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 13:19:00,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10200, loss[loss=0.07966, beats_loss=0.01402, ecapa_loss=0.0001992, whisper_loss=0.06365, over 16185.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01133, ecapa_loss=0.0002, whisper_loss=0.09318, over 3880929.43 frames. ], batch size: 64, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:19:01,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116470.0, ans=0.1 2024-08-11 13:19:06,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=1116470.0, ans=0.1 2024-08-11 13:19:13,554 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 13:19:28,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116570.0, ans=0.1 2024-08-11 13:19:28,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1116570.0, ans=0.0 2024-08-11 13:19:34,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1116670.0, ans=0.1 2024-08-11 13:19:45,867 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 13:19:57,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-11 13:20:00,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1116770.0, ans=0.125 2024-08-11 13:20:10,502 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 13:20:19,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10250, loss[loss=0.09678, beats_loss=0.01141, ecapa_loss=0.0001836, whisper_loss=0.08353, over 20418.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0113, ecapa_loss=0.0001991, whisper_loss=0.09334, over 3869927.20 frames. ], batch size: 80, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:20:23,868 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.664e+01 3.001e+01 3.567e+01 5.136e+01, threshold=6.003e+01, percent-clipped=0.0 2024-08-11 13:20:33,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1116970.0, ans=0.125 2024-08-11 13:20:38,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1117070.0, ans=0.0 2024-08-11 13:20:46,584 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 13:20:53,348 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 13:21:02,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-11 13:21:07,846 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 13:21:12,753 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 13:21:16,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-11 13:21:21,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1117270.0, ans=0.125 2024-08-11 13:21:38,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1117370.0, ans=0.0 2024-08-11 13:21:40,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10300, loss[loss=0.1038, beats_loss=0.009282, ecapa_loss=0.0002264, whisper_loss=0.09226, over 13459.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01136, ecapa_loss=0.000198, whisper_loss=0.09294, over 3889162.49 frames. ], batch size: 55, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:21:42,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1117470.0, ans=0.0 2024-08-11 13:22:00,669 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 13:22:40,967 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 13:22:42,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1117770.0, ans=0.1 2024-08-11 13:22:50,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=1117870.0, ans=0.2 2024-08-11 13:22:52,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2024-08-11 13:22:53,597 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:23:01,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10350, loss[loss=0.1, beats_loss=0.008463, ecapa_loss=0.0002215, whisper_loss=0.08933, over 15483.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01128, ecapa_loss=0.0001985, whisper_loss=0.09374, over 3915266.78 frames. ], batch size: 60, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:23:02,353 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 13:23:06,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.796e+01 3.108e+01 3.786e+01 6.316e+01, threshold=6.215e+01, percent-clipped=1.0 2024-08-11 13:23:36,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-08-11 13:23:50,217 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 13:24:06,231 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 13:24:07,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2024-08-11 13:24:09,998 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:24:17,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1118470.0, ans=0.0 2024-08-11 13:24:18,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10400, loss[loss=0.1291, beats_loss=0.01098, ecapa_loss=0.0001478, whisper_loss=0.1166, over 24554.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01127, ecapa_loss=0.0001991, whisper_loss=0.09378, over 3900488.84 frames. ], batch size: 91, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:24:26,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1118470.0, ans=0.0 2024-08-11 13:24:26,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1118470.0, ans=0.125 2024-08-11 13:24:47,852 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 13:24:54,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1118670.0, ans=0.1 2024-08-11 13:25:02,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1118670.0, ans=0.125 2024-08-11 13:25:13,050 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 13:25:17,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1118770.0, ans=0.05 2024-08-11 13:25:35,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10450, loss[loss=0.09444, beats_loss=0.01232, ecapa_loss=0.0002457, whisper_loss=0.07966, over 22878.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01127, ecapa_loss=0.0001994, whisper_loss=0.09343, over 3859939.98 frames. ], batch size: 100, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:25:37,006 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 13:25:39,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.711e+01 3.019e+01 3.517e+01 4.993e+01, threshold=6.039e+01, percent-clipped=0.0 2024-08-11 13:25:48,281 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 13:26:08,578 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-11 13:26:08,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1119170.0, ans=0.125 2024-08-11 13:26:09,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-08-11 13:26:19,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-11 13:26:35,146 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 13:26:45,065 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-11 13:26:51,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1119370.0, ans=0.0 2024-08-11 13:26:53,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10500, loss[loss=0.11, beats_loss=0.009539, ecapa_loss=0.0001903, whisper_loss=0.09854, over 18233.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01134, ecapa_loss=0.0001991, whisper_loss=0.09172, over 3823528.15 frames. ], batch size: 70, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:26:57,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1119470.0, ans=0.1 2024-08-11 13:27:12,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2024-08-11 13:27:28,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1119670.0, ans=0.125 2024-08-11 13:27:29,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1119670.0, ans=0.1 2024-08-11 13:27:37,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1119670.0, ans=0.0 2024-08-11 13:27:59,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1119870.0, ans=0.0 2024-08-11 13:28:05,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1119870.0, ans=0.0 2024-08-11 13:28:07,553 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 13:28:08,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1119870.0, ans=0.125 2024-08-11 13:28:11,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10550, loss[loss=0.1153, beats_loss=0.01226, ecapa_loss=0.00022, whisper_loss=0.1008, over 21411.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01142, ecapa_loss=0.0001981, whisper_loss=0.09196, over 3844947.41 frames. ], batch size: 85, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:28:17,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.650e+01 3.072e+01 3.667e+01 9.491e+01, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:28:20,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1119970.0, ans=0.2 2024-08-11 13:28:22,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-11 13:28:59,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1120270.0, ans=0.125 2024-08-11 13:29:13,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1120270.0, ans=10.0 2024-08-11 13:29:23,461 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 13:29:25,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1120370.0, ans=0.0 2024-08-11 13:29:33,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10600, loss[loss=0.09449, beats_loss=0.01316, ecapa_loss=0.0002155, whisper_loss=0.07917, over 22471.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01137, ecapa_loss=0.0001983, whisper_loss=0.09266, over 3873426.10 frames. ], batch size: 91, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:29:59,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1120570.0, ans=10.0 2024-08-11 13:30:04,307 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 13:30:10,297 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 13:30:17,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1120670.0, ans=0.0 2024-08-11 13:30:17,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1120670.0, ans=0.1 2024-08-11 13:30:22,581 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 13:30:24,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1120770.0, ans=0.125 2024-08-11 13:30:50,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10650, loss[loss=0.1237, beats_loss=0.009721, ecapa_loss=0.0001998, whisper_loss=0.112, over 21775.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01141, ecapa_loss=0.000197, whisper_loss=0.09257, over 3865821.68 frames. ], batch size: 84, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:30:57,376 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.737e+01 3.110e+01 3.500e+01 6.521e+01, threshold=6.221e+01, percent-clipped=1.0 2024-08-11 13:31:08,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1121070.0, ans=0.0 2024-08-11 13:31:09,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1121070.0, ans=0.125 2024-08-11 13:31:13,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1121070.0, ans=0.125 2024-08-11 13:31:23,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1121170.0, ans=0.125 2024-08-11 13:32:05,743 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 13:32:05,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1121370.0, ans=0.125 2024-08-11 13:32:10,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10700, loss[loss=0.1068, beats_loss=0.009434, ecapa_loss=0.0002033, whisper_loss=0.09536, over 16643.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01123, ecapa_loss=0.0001976, whisper_loss=0.09414, over 3867360.76 frames. ], batch size: 65, lr: 7.77e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:32:22,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1121470.0, ans=0.0 2024-08-11 13:32:28,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1121570.0, ans=0.2 2024-08-11 13:32:39,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1121570.0, ans=0.0 2024-08-11 13:32:42,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=12.0 2024-08-11 13:32:46,713 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 13:32:50,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-11 13:32:54,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1121670.0, ans=0.125 2024-08-11 13:33:00,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1121770.0, ans=0.0 2024-08-11 13:33:00,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1121770.0, ans=0.1 2024-08-11 13:33:01,634 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 13:33:16,870 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 13:33:31,041 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10750, loss[loss=0.1001, beats_loss=0.01077, ecapa_loss=0.0001809, whisper_loss=0.08752, over 14755.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01121, ecapa_loss=0.0001994, whisper_loss=0.09464, over 3885802.40 frames. ], batch size: 57, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:33:38,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.776e+01 3.070e+01 3.397e+01 5.449e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-11 13:33:48,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1122070.0, ans=0.0 2024-08-11 13:33:54,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1122070.0, ans=0.125 2024-08-11 13:34:05,379 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.698e-02 2024-08-11 13:34:31,722 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 13:34:49,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1122470.0, ans=0.125 2024-08-11 13:34:49,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10800, loss[loss=0.09388, beats_loss=0.01218, ecapa_loss=0.0002038, whisper_loss=0.07966, over 18503.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01114, ecapa_loss=0.0001996, whisper_loss=0.09531, over 3914507.49 frames. ], batch size: 77, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:35:22,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1122670.0, ans=0.125 2024-08-11 13:35:27,908 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 13:35:30,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1122670.0, ans=0.0 2024-08-11 13:35:32,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1122670.0, ans=0.125 2024-08-11 13:35:46,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1122770.0, ans=0.05 2024-08-11 13:35:47,284 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 13:35:47,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122770.0, ans=0.1 2024-08-11 13:35:51,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1122870.0, ans=0.125 2024-08-11 13:35:52,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1122870.0, ans=0.1 2024-08-11 13:35:57,509 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 13:36:00,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1122870.0, ans=0.0 2024-08-11 13:36:01,747 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 13:36:07,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10850, loss[loss=0.1106, beats_loss=0.01164, ecapa_loss=0.0002184, whisper_loss=0.09679, over 22608.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.0112, ecapa_loss=0.0001996, whisper_loss=0.09499, over 3928490.73 frames. ], batch size: 90, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:36:09,420 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 13:36:14,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2024-08-11 13:36:15,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.852e+01 3.448e+01 4.280e+01 7.389e+01, threshold=6.896e+01, percent-clipped=2.0 2024-08-11 13:36:26,120 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 13:36:59,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1123270.0, ans=0.05 2024-08-11 13:37:09,624 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 13:37:15,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1123370.0, ans=0.125 2024-08-11 13:37:21,518 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 13:37:28,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1123470.0, ans=0.0 2024-08-11 13:37:29,103 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10900, loss[loss=0.09051, beats_loss=0.01303, ecapa_loss=0.0002111, whisper_loss=0.07537, over 19950.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01129, ecapa_loss=0.0001985, whisper_loss=0.09398, over 3931483.21 frames. ], batch size: 85, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:37:37,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1123470.0, ans=0.125 2024-08-11 13:37:50,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1123570.0, ans=0.2 2024-08-11 13:37:57,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1123570.0, ans=0.125 2024-08-11 13:37:57,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1123570.0, ans=0.125 2024-08-11 13:37:58,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1123570.0, ans=0.125 2024-08-11 13:38:00,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-11 13:38:37,552 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 13:38:41,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1123870.0, ans=0.125 2024-08-11 13:38:45,895 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 10950, loss[loss=0.08864, beats_loss=0.008655, ecapa_loss=0.0001878, whisper_loss=0.07811, over 16203.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0001984, whisper_loss=0.09415, over 3927725.96 frames. ], batch size: 62, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:38:48,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1123970.0, ans=0.125 2024-08-11 13:38:51,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1123970.0, ans=0.1 2024-08-11 13:38:53,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.774e+01 3.085e+01 3.666e+01 6.229e+01, threshold=6.171e+01, percent-clipped=0.0 2024-08-11 13:39:04,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1124070.0, ans=0.0 2024-08-11 13:39:10,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1124070.0, ans=0.125 2024-08-11 13:39:17,333 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 13:39:20,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1124170.0, ans=0.125 2024-08-11 13:39:26,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1124170.0, ans=0.05 2024-08-11 13:39:40,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1124270.0, ans=0.125 2024-08-11 13:39:44,212 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 13:39:46,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2024-08-11 13:39:52,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-08-11 13:39:53,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1124370.0, ans=0.2 2024-08-11 13:40:00,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1124370.0, ans=0.125 2024-08-11 13:40:03,041 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11000, loss[loss=0.107, beats_loss=0.01148, ecapa_loss=0.0002398, whisper_loss=0.09313, over 18307.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01124, ecapa_loss=0.0001994, whisper_loss=0.09378, over 3913096.08 frames. ], batch size: 76, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:40:03,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-11 13:40:14,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.00 vs. limit=6.0 2024-08-11 13:40:16,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1124470.0, ans=0.125 2024-08-11 13:40:18,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1124570.0, ans=0.0 2024-08-11 13:40:21,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1124570.0, ans=0.0 2024-08-11 13:40:29,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2024-08-11 13:40:30,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1124570.0, ans=0.125 2024-08-11 13:40:32,023 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 13:40:47,917 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:40:50,277 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 13:40:58,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2024-08-11 13:41:01,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2024-08-11 13:41:22,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11050, loss[loss=0.09989, beats_loss=0.01164, ecapa_loss=0.0002076, whisper_loss=0.08617, over 15148.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01123, ecapa_loss=0.0002001, whisper_loss=0.09384, over 3921285.91 frames. ], batch size: 60, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:41:27,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1124970.0, ans=0.2 2024-08-11 13:41:29,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.704e+01 3.049e+01 3.665e+01 6.034e+01, threshold=6.098e+01, percent-clipped=0.0 2024-08-11 13:41:41,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1125070.0, ans=0.125 2024-08-11 13:41:50,838 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 13:42:03,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2024-08-11 13:42:12,605 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 13:42:37,915 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 13:42:39,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11100, loss[loss=0.1134, beats_loss=0.009351, ecapa_loss=0.0002573, whisper_loss=0.1015, over 18812.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01125, ecapa_loss=0.0002005, whisper_loss=0.09316, over 3904709.39 frames. ], batch size: 77, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:42:46,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1125470.0, ans=0.0 2024-08-11 13:42:53,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-08-11 13:43:23,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-11 13:43:30,388 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 13:43:36,688 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 13:43:39,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1125770.0, ans=0.125 2024-08-11 13:43:48,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1125870.0, ans=0.2 2024-08-11 13:44:01,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11150, loss[loss=0.1141, beats_loss=0.008605, ecapa_loss=0.0002284, whisper_loss=0.1032, over 15738.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01116, ecapa_loss=0.0002004, whisper_loss=0.09369, over 3895734.10 frames. ], batch size: 61, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:44:06,048 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 30 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 13:44:09,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.627e+01 3.035e+01 3.415e+01 6.543e+01, threshold=6.070e+01, percent-clipped=1.0 2024-08-11 13:44:12,762 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 13:44:23,973 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 8 from Vox, 31 fro AS 2024-08-11 13:44:28,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1126070.0, ans=0.1 2024-08-11 13:44:36,673 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 13:44:40,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1126170.0, ans=0.1 2024-08-11 13:44:42,739 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:44:42,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1126170.0, ans=0.05 2024-08-11 13:44:53,349 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 13:45:03,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2024-08-11 13:45:13,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1126370.0, ans=0.125 2024-08-11 13:45:17,313 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 13:45:18,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11200, loss[loss=0.107, beats_loss=0.01239, ecapa_loss=0.000165, whisper_loss=0.09298, over 20162.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01115, ecapa_loss=0.0002003, whisper_loss=0.09375, over 3876154.77 frames. ], batch size: 79, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:45:22,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1126470.0, ans=0.0 2024-08-11 13:45:32,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1126470.0, ans=0.05 2024-08-11 13:45:42,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1126570.0, ans=0.125 2024-08-11 13:45:53,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1126670.0, ans=0.0 2024-08-11 13:45:59,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1126670.0, ans=0.125 2024-08-11 13:46:03,304 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 13:46:15,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1126770.0, ans=0.125 2024-08-11 13:46:19,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1126770.0, ans=0.5 2024-08-11 13:46:21,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1126770.0, ans=0.125 2024-08-11 13:46:26,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1126870.0, ans=0.125 2024-08-11 13:46:27,830 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-11 13:46:38,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=15.0 2024-08-11 13:46:42,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1126970.0, ans=0.125 2024-08-11 13:46:42,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11250, loss[loss=0.1041, beats_loss=0.01187, ecapa_loss=0.0002005, whisper_loss=0.09027, over 20093.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01112, ecapa_loss=0.0002001, whisper_loss=0.0946, over 3907262.23 frames. ], batch size: 83, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:46:52,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.684e+01 2.944e+01 3.546e+01 6.829e+01, threshold=5.887e+01, percent-clipped=2.0 2024-08-11 13:47:26,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-11 13:47:48,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2024-08-11 13:47:52,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1127370.0, ans=0.125 2024-08-11 13:48:05,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11300, loss[loss=0.1014, beats_loss=0.007849, ecapa_loss=0.0002148, whisper_loss=0.09141, over 15361.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01109, ecapa_loss=0.0002007, whisper_loss=0.09379, over 3877911.57 frames. ], batch size: 60, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:48:23,685 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-11 13:48:47,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1127670.0, ans=0.125 2024-08-11 13:48:48,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1127670.0, ans=0.125 2024-08-11 13:49:07,152 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 13:49:21,093 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 13:49:25,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11350, loss[loss=0.1108, beats_loss=0.01013, ecapa_loss=0.0002311, whisper_loss=0.09836, over 22135.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0111, ecapa_loss=0.0001985, whisper_loss=0.09376, over 3878040.87 frames. ], batch size: 89, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:49:33,904 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.648e+01 3.083e+01 3.583e+01 5.645e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-11 13:49:52,899 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 13:49:56,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-11 13:50:00,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.20 vs. limit=15.0 2024-08-11 13:50:44,165 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11400, loss[loss=0.112, beats_loss=0.0098, ecapa_loss=0.0002236, whisper_loss=0.09994, over 19806.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01108, ecapa_loss=0.0001991, whisper_loss=0.09405, over 3899516.00 frames. ], batch size: 79, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:51:05,183 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 13:51:05,571 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.315e-02 2024-08-11 13:51:08,166 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 14 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 13:51:08,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1128570.0, ans=0.0 2024-08-11 13:51:08,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1128570.0, ans=0.125 2024-08-11 13:51:09,774 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 13:51:25,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1128670.0, ans=0.1 2024-08-11 13:51:51,454 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 13:51:57,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1128870.0, ans=0.1 2024-08-11 13:51:59,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11450, loss[loss=0.107, beats_loss=0.01279, ecapa_loss=0.0001544, whisper_loss=0.09266, over 22198.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0112, ecapa_loss=0.0001978, whisper_loss=0.09395, over 3918787.66 frames. ], batch size: 88, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:52:07,519 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.738e+01 3.140e+01 3.413e+01 5.128e+01, threshold=6.280e+01, percent-clipped=0.0 2024-08-11 13:52:15,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1129070.0, ans=0.125 2024-08-11 13:52:39,871 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 13:52:45,884 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 13:52:48,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-11 13:52:52,532 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 13:52:55,940 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 28 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-11 13:53:17,924 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11500, loss[loss=0.1193, beats_loss=0.012, ecapa_loss=0.0001559, whisper_loss=0.1058, over 19406.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01117, ecapa_loss=0.0001982, whisper_loss=0.09436, over 3898772.03 frames. ], batch size: 74, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:53:25,364 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 13:53:26,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1129470.0, ans=0.5 2024-08-11 13:53:27,759 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 13:53:34,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1129570.0, ans=0.125 2024-08-11 13:53:37,095 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 40 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 13:53:56,313 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.176e-01 2024-08-11 13:53:57,402 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 13:54:03,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2024-08-11 13:54:13,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1129770.0, ans=0.0 2024-08-11 13:54:21,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129870.0, ans=0.1 2024-08-11 13:54:35,005 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 13:54:36,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11550, loss[loss=0.1142, beats_loss=0.01089, ecapa_loss=0.0002196, whisper_loss=0.1011, over 23083.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01119, ecapa_loss=0.0001981, whisper_loss=0.09442, over 3917852.72 frames. ], batch size: 92, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:54:45,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.847e+01 3.236e+01 3.830e+01 5.730e+01, threshold=6.473e+01, percent-clipped=0.0 2024-08-11 13:54:56,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-08-11 13:54:59,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1130070.0, ans=0.1 2024-08-11 13:55:04,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1130070.0, ans=0.025 2024-08-11 13:55:04,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1130070.0, ans=0.05 2024-08-11 13:55:27,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1130170.0, ans=0.1 2024-08-11 13:55:28,816 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 13:55:36,020 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 13:55:37,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1130270.0, ans=10.0 2024-08-11 13:55:39,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1130270.0, ans=0.125 2024-08-11 13:55:50,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2024-08-11 13:55:51,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-11 13:55:56,971 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 13:55:59,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11600, loss[loss=0.1089, beats_loss=0.01258, ecapa_loss=0.0001373, whisper_loss=0.09496, over 18017.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01114, ecapa_loss=0.0001985, whisper_loss=0.09418, over 3904249.18 frames. ], batch size: 67, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:56:09,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1130470.0, ans=0.125 2024-08-11 13:56:26,798 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 13:56:34,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1130670.0, ans=0.2 2024-08-11 13:57:05,556 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 13:57:05,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1130870.0, ans=0.0 2024-08-11 13:57:10,209 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 13:57:10,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1130870.0, ans=0.5 2024-08-11 13:57:12,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1130870.0, ans=0.0 2024-08-11 13:57:17,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11650, loss[loss=0.09735, beats_loss=0.01142, ecapa_loss=0.0001946, whisper_loss=0.08398, over 14011.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01127, ecapa_loss=0.000198, whisper_loss=0.09352, over 3913800.65 frames. ], batch size: 59, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:57:26,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.647e+01 2.966e+01 3.476e+01 5.523e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 13:57:32,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1130970.0, ans=0.0 2024-08-11 13:57:33,384 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 15 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-11 13:57:36,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1131070.0, ans=0.09899494936611666 2024-08-11 13:57:49,799 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 13:58:16,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1131270.0, ans=0.125 2024-08-11 13:58:17,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1131270.0, ans=0.0 2024-08-11 13:58:24,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1131370.0, ans=0.125 2024-08-11 13:58:30,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1131370.0, ans=0.0 2024-08-11 13:58:35,057 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11700, loss[loss=0.1084, beats_loss=0.009754, ecapa_loss=0.0001837, whisper_loss=0.09686, over 15722.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01132, ecapa_loss=0.0002006, whisper_loss=0.09323, over 3896479.69 frames. ], batch size: 58, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:58:40,317 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 13:58:43,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1131470.0, ans=10.0 2024-08-11 13:58:55,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1131570.0, ans=0.0 2024-08-11 13:59:00,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1131570.0, ans=0.125 2024-08-11 13:59:02,096 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 13:59:15,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1131670.0, ans=0.07 2024-08-11 13:59:21,711 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-11 13:59:51,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1131870.0, ans=0.2 2024-08-11 13:59:53,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2024-08-11 13:59:53,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-11 13:59:56,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2024-08-11 13:59:56,809 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11750, loss[loss=0.1151, beats_loss=0.01072, ecapa_loss=0.0001916, whisper_loss=0.1025, over 22329.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01145, ecapa_loss=0.0001988, whisper_loss=0.09272, over 3893439.02 frames. ], batch size: 88, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:00:04,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+01 2.835e+01 3.323e+01 3.805e+01 1.328e+02, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 14:00:08,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1131970.0, ans=0.125 2024-08-11 14:00:10,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1131970.0, ans=0.0 2024-08-11 14:00:20,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1132070.0, ans=0.125 2024-08-11 14:00:20,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1132070.0, ans=0.125 2024-08-11 14:00:51,509 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 14:00:53,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1132270.0, ans=0.04949747468305833 2024-08-11 14:01:08,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2024-08-11 14:01:11,370 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 14:01:15,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11800, loss[loss=0.1259, beats_loss=0.00898, ecapa_loss=0.0002746, whisper_loss=0.1142, over 16599.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01144, ecapa_loss=0.0001981, whisper_loss=0.09262, over 3889241.49 frames. ], batch size: 74, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:01:29,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2024-08-11 14:01:49,181 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 14:01:49,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1132670.0, ans=0.0 2024-08-11 14:01:49,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2024-08-11 14:01:52,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1132670.0, ans=0.0 2024-08-11 14:02:02,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1132770.0, ans=0.125 2024-08-11 14:02:03,582 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.189e-02 2024-08-11 14:02:08,606 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 14:02:26,559 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 14:02:30,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11850, loss[loss=0.1164, beats_loss=0.009492, ecapa_loss=0.0002342, whisper_loss=0.1046, over 18387.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01154, ecapa_loss=0.0001976, whisper_loss=0.09202, over 3905241.78 frames. ], batch size: 76, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:02:31,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1132970.0, ans=0.125 2024-08-11 14:02:38,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.696e+01 3.020e+01 3.645e+01 5.662e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 14:02:46,203 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-11 14:02:59,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2024-08-11 14:03:25,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1133270.0, ans=0.125 2024-08-11 14:03:25,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-08-11 14:03:26,311 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 14:03:36,404 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 34 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 14:03:46,723 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11900, loss[loss=0.1152, beats_loss=0.01, ecapa_loss=0.0002297, whisper_loss=0.1029, over 14188.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01153, ecapa_loss=0.0001978, whisper_loss=0.09265, over 3922277.47 frames. ], batch size: 58, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:03:50,503 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=9.924e-02 2024-08-11 14:04:05,436 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-11 14:04:07,185 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 14:04:11,784 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 14:04:12,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1133570.0, ans=0.125 2024-08-11 14:04:16,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2024-08-11 14:04:20,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1133670.0, ans=0.125 2024-08-11 14:04:22,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1133670.0, ans=0.125 2024-08-11 14:04:25,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1133670.0, ans=0.0 2024-08-11 14:04:42,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1133770.0, ans=0.0 2024-08-11 14:04:51,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1133870.0, ans=0.0 2024-08-11 14:04:52,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1133870.0, ans=0.125 2024-08-11 14:05:03,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1133970.0, ans=0.07 2024-08-11 14:05:04,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 11950, loss[loss=0.1143, beats_loss=0.00967, ecapa_loss=0.000205, whisper_loss=0.1025, over 15291.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01142, ecapa_loss=0.0001987, whisper_loss=0.09282, over 3914214.64 frames. ], batch size: 57, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:05:10,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1133970.0, ans=0.0 2024-08-11 14:05:12,827 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.574e+01 2.891e+01 3.292e+01 6.091e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-11 14:05:13,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-11 14:05:24,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1134070.0, ans=0.1 2024-08-11 14:05:48,331 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 14:06:04,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1134270.0, ans=0.1 2024-08-11 14:06:04,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.20 vs. limit=10.0 2024-08-11 14:06:24,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12000, loss[loss=0.1002, beats_loss=0.01198, ecapa_loss=0.0001743, whisper_loss=0.08651, over 23181.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01135, ecapa_loss=0.0001988, whisper_loss=0.09297, over 3907093.10 frames. ], batch size: 93, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:06:24,371 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 14:07:03,238 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006428, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 14:07:22,395 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on SV_voxceleb1: loss=0.005208, beats_loss=0, ecapa_loss=0.0005208, whisper_loss=0, over 939242.00 frames. 2024-08-11 14:09:12,801 INFO [train_multi_KD3.py:1149] (3/4) Epoch 8, validation on AT_audioset: loss=0.02509, beats_loss=0.02509, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 14:09:12,805 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 14:09:14,196 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 14:09:19,978 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.997e-01 2024-08-11 14:09:24,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-11 14:09:35,468 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 14:09:39,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-08-11 14:09:39,859 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 14:09:55,016 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 14:09:56,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1134770.0, ans=0.125 2024-08-11 14:10:07,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1134770.0, ans=0.125 2024-08-11 14:10:09,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1134770.0, ans=0.0 2024-08-11 14:10:19,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1134870.0, ans=0.2 2024-08-11 14:10:26,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12050, loss[loss=0.0948, beats_loss=0.01137, ecapa_loss=0.0002012, whisper_loss=0.08142, over 21680.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01135, ecapa_loss=0.0001984, whisper_loss=0.09278, over 3874604.88 frames. ], batch size: 88, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:10:34,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.739e+01 2.961e+01 3.556e+01 5.317e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 14:10:35,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1134970.0, ans=0.125 2024-08-11 14:10:35,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1134970.0, ans=0.125 2024-08-11 14:10:38,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1134970.0, ans=0.0 2024-08-11 14:10:43,028 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-11 14:11:02,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1135170.0, ans=0.125 2024-08-11 14:11:05,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1135170.0, ans=0.0 2024-08-11 14:11:11,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1135270.0, ans=0.125 2024-08-11 14:11:14,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1135270.0, ans=0.0 2024-08-11 14:11:19,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1135270.0, ans=0.07 2024-08-11 14:11:21,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.31 vs. limit=22.5 2024-08-11 14:11:26,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1135370.0, ans=0.125 2024-08-11 14:11:42,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12100, loss[loss=0.09168, beats_loss=0.01156, ecapa_loss=0.0002553, whisper_loss=0.07757, over 18905.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01126, ecapa_loss=0.0001987, whisper_loss=0.09346, over 3872671.15 frames. ], batch size: 81, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:11:49,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1135470.0, ans=0.0 2024-08-11 14:11:55,910 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 14:12:08,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1135570.0, ans=0.1 2024-08-11 14:12:14,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1135670.0, ans=0.125 2024-08-11 14:12:35,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1135770.0, ans=0.04949747468305833 2024-08-11 14:12:52,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12150, loss[loss=0.1105, beats_loss=0.01232, ecapa_loss=0.0001775, whisper_loss=0.09641, over 16878.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01125, ecapa_loss=0.0001993, whisper_loss=0.09251, over 3852252.31 frames. ], batch size: 65, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:12:59,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.505e+01 2.843e+01 3.166e+01 1.229e+02, threshold=5.686e+01, percent-clipped=1.0 2024-08-11 14:13:06,472 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 14:13:09,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1136070.0, ans=0.0 2024-08-11 14:13:10,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.12 vs. limit=10.0 2024-08-11 14:13:14,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1136070.0, ans=0.2 2024-08-11 14:13:24,200 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-11 14:13:24,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1136170.0, ans=0.0 2024-08-11 14:13:26,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1136170.0, ans=0.05 2024-08-11 14:13:29,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1136170.0, ans=0.125 2024-08-11 14:13:52,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1136370.0, ans=0.0 2024-08-11 14:13:53,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1136370.0, ans=0.025 2024-08-11 14:13:58,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1136370.0, ans=0.125 2024-08-11 14:14:00,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12200, loss[loss=0.08456, beats_loss=0.01358, ecapa_loss=0.0001992, whisper_loss=0.06899, over 20377.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01124, ecapa_loss=0.0001996, whisper_loss=0.09277, over 3854747.56 frames. ], batch size: 85, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:14:31,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=12.0 2024-08-11 14:14:41,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2024-08-11 14:14:44,445 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-11 14:14:44,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1136770.0, ans=0.04949747468305833 2024-08-11 14:14:46,543 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.406e-02 2024-08-11 14:14:50,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1136770.0, ans=0.0 2024-08-11 14:15:02,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1136870.0, ans=0.2 2024-08-11 14:15:09,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12250, loss[loss=0.08847, beats_loss=0.01117, ecapa_loss=0.0002274, whisper_loss=0.07503, over 16173.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01121, ecapa_loss=0.0001994, whisper_loss=0.09284, over 3827457.20 frames. ], batch size: 67, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:15:14,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1136970.0, ans=0.2 2024-08-11 14:15:16,445 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.710e+01 3.098e+01 3.529e+01 5.582e+01, threshold=6.197e+01, percent-clipped=0.0 2024-08-11 14:15:17,873 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 14:15:22,169 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 14:15:29,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2024-08-11 14:15:43,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1137170.0, ans=0.125 2024-08-11 14:15:57,611 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 14:16:04,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1137370.0, ans=0.125 2024-08-11 14:16:11,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1137370.0, ans=0.0 2024-08-11 14:16:19,031 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12300, loss[loss=0.1161, beats_loss=0.01222, ecapa_loss=0.0001823, whisper_loss=0.1021, over 17325.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01123, ecapa_loss=0.0002001, whisper_loss=0.09267, over 3839035.31 frames. ], batch size: 67, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:16:23,356 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 14:16:28,702 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 14:16:31,327 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 14:16:34,135 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 14:16:34,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1137570.0, ans=0.0 2024-08-11 14:16:40,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1137570.0, ans=0.1 2024-08-11 14:16:47,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1137670.0, ans=0.2 2024-08-11 14:16:47,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2024-08-11 14:16:49,975 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 26 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-11 14:16:51,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1137670.0, ans=0.125 2024-08-11 14:17:28,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12350, loss[loss=0.114, beats_loss=0.01149, ecapa_loss=0.0001722, whisper_loss=0.1007, over 22294.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01123, ecapa_loss=0.0002007, whisper_loss=0.09288, over 3867345.88 frames. ], batch size: 88, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:17:31,893 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 14:17:36,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.771e+01 3.079e+01 3.408e+01 5.279e+01, threshold=6.158e+01, percent-clipped=0.0 2024-08-11 14:17:40,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137970.0, ans=0.1 2024-08-11 14:17:46,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1138070.0, ans=0.0 2024-08-11 14:18:07,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1138170.0, ans=0.125 2024-08-11 14:18:16,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1138270.0, ans=0.125 2024-08-11 14:18:19,424 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 14:18:31,447 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 14:18:34,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1138370.0, ans=0.0 2024-08-11 14:18:40,605 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 14:18:41,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12400, loss[loss=0.1062, beats_loss=0.01273, ecapa_loss=0.0001831, whisper_loss=0.09168, over 18226.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01126, ecapa_loss=0.0001998, whisper_loss=0.09241, over 3857732.93 frames. ], batch size: 72, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:18:42,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1138470.0, ans=0.125 2024-08-11 14:18:50,630 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-11 14:18:51,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-11 14:19:06,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1138570.0, ans=0.125 2024-08-11 14:19:10,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1138670.0, ans=0.125 2024-08-11 14:19:17,423 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.183e+02 2024-08-11 14:19:17,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-11 14:19:25,412 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 14:19:37,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1138870.0, ans=10.0 2024-08-11 14:19:39,833 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-11 14:19:51,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12450, loss[loss=0.1135, beats_loss=0.0134, ecapa_loss=0.0001695, whisper_loss=0.09839, over 22539.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01123, ecapa_loss=0.0001986, whisper_loss=0.09273, over 3846236.38 frames. ], batch size: 94, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:19:56,588 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 14:19:59,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.783e+01 3.134e+01 3.561e+01 9.376e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-11 14:20:11,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1139070.0, ans=0.125 2024-08-11 14:20:14,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1139070.0, ans=0.125 2024-08-11 14:20:21,343 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 14:20:40,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1139270.0, ans=0.2 2024-08-11 14:20:45,869 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 14:20:47,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1139270.0, ans=0.2 2024-08-11 14:20:53,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1139370.0, ans=0.0 2024-08-11 14:20:55,915 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 14:20:57,160 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 14:21:04,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12500, loss[loss=0.1164, beats_loss=0.01247, ecapa_loss=0.0002001, whisper_loss=0.102, over 22285.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01124, ecapa_loss=0.0001977, whisper_loss=0.09292, over 3867583.04 frames. ], batch size: 90, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:21:11,890 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 14:21:30,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1139570.0, ans=0.125 2024-08-11 14:21:33,715 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 14:21:46,368 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 14:21:48,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1139670.0, ans=0.2 2024-08-11 14:21:54,545 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 14:21:55,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2024-08-11 14:22:00,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1139770.0, ans=0.125 2024-08-11 14:22:08,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1139870.0, ans=0.2 2024-08-11 14:22:15,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139870.0, ans=0.1 2024-08-11 14:22:20,516 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12550, loss[loss=0.123, beats_loss=0.01179, ecapa_loss=0.0001435, whisper_loss=0.1097, over 21706.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01123, ecapa_loss=0.0001967, whisper_loss=0.09411, over 3906979.57 frames. ], batch size: 79, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:22:27,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.780e+01 3.157e+01 3.733e+01 7.024e+01, threshold=6.315e+01, percent-clipped=2.0 2024-08-11 14:22:32,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1139970.0, ans=0.0 2024-08-11 14:22:36,196 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 14:22:38,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1140070.0, ans=0.0 2024-08-11 14:22:39,326 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 14:22:42,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1140070.0, ans=15.0 2024-08-11 14:22:43,467 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 14:22:49,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1140170.0, ans=0.0 2024-08-11 14:22:49,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-11 14:22:50,447 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 34 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 14:22:58,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-11 14:23:07,993 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 14:23:17,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1140270.0, ans=0.0 2024-08-11 14:23:32,003 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 14:23:33,344 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-11 14:23:34,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12600, loss[loss=0.0864, beats_loss=0.01324, ecapa_loss=0.0001668, whisper_loss=0.07149, over 22652.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0113, ecapa_loss=0.0001965, whisper_loss=0.09426, over 3912226.25 frames. ], batch size: 92, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:23:34,835 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 14:23:43,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1140470.0, ans=0.0 2024-08-11 14:23:50,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1140570.0, ans=0.07 2024-08-11 14:23:52,043 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 14:23:58,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2024-08-11 14:24:07,273 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 14:24:10,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1140670.0, ans=0.125 2024-08-11 14:24:13,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1140670.0, ans=0.125 2024-08-11 14:24:13,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1140670.0, ans=0.125 2024-08-11 14:24:24,705 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 14:24:26,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1140770.0, ans=0.2 2024-08-11 14:24:48,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12650, loss[loss=0.1173, beats_loss=0.0123, ecapa_loss=0.0002053, whisper_loss=0.103, over 21329.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01132, ecapa_loss=0.0001979, whisper_loss=0.09386, over 3892036.49 frames. ], batch size: 88, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:24:55,237 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.818e+01 3.225e+01 3.809e+01 6.974e+01, threshold=6.451e+01, percent-clipped=1.0 2024-08-11 14:25:16,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-11 14:25:22,456 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 14:25:24,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1141170.0, ans=0.0 2024-08-11 14:25:31,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1141270.0, ans=0.1 2024-08-11 14:26:00,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1141470.0, ans=0.125 2024-08-11 14:26:00,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12700, loss[loss=0.1182, beats_loss=0.01014, ecapa_loss=0.0002201, whisper_loss=0.1058, over 21700.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0001972, whisper_loss=0.0938, over 3904232.58 frames. ], batch size: 87, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:26:03,152 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 14:26:12,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-11 14:26:27,180 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 14:26:30,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1141670.0, ans=10.0 2024-08-11 14:26:35,507 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:26:46,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1141770.0, ans=0.0 2024-08-11 14:26:51,722 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 14:27:00,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1141870.0, ans=0.07 2024-08-11 14:27:02,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1141870.0, ans=0.125 2024-08-11 14:27:06,739 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.506e+05 2024-08-11 14:27:08,115 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-11 14:27:10,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12750, loss[loss=0.09781, beats_loss=0.01544, ecapa_loss=0.0001926, whisper_loss=0.08044, over 22457.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01151, ecapa_loss=0.0001977, whisper_loss=0.09316, over 3892826.15 frames. ], batch size: 94, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:27:10,742 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 38 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 14:27:11,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1141970.0, ans=0.125 2024-08-11 14:27:17,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.661e+01 2.986e+01 3.443e+01 7.051e+01, threshold=5.972e+01, percent-clipped=1.0 2024-08-11 14:27:30,438 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 14:27:36,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1142070.0, ans=0.1 2024-08-11 14:27:42,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-08-11 14:27:43,345 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 14:27:54,495 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 14:28:12,624 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 14:28:20,504 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12800, loss[loss=0.1036, beats_loss=0.0122, ecapa_loss=0.0001697, whisper_loss=0.0897, over 22452.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01152, ecapa_loss=0.0001988, whisper_loss=0.0925, over 3881024.79 frames. ], batch size: 89, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:28:32,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1142470.0, ans=0.125 2024-08-11 14:28:52,575 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 14:29:18,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-11 14:29:29,282 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 14:29:30,900 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 14:29:31,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12850, loss[loss=0.08809, beats_loss=0.01244, ecapa_loss=0.0002219, whisper_loss=0.07343, over 18643.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01152, ecapa_loss=0.0001992, whisper_loss=0.09201, over 3870006.99 frames. ], batch size: 81, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:29:32,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1142970.0, ans=0.125 2024-08-11 14:29:38,557 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.679e+01 2.923e+01 3.402e+01 6.033e+01, threshold=5.846e+01, percent-clipped=2.0 2024-08-11 14:29:40,139 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 14:29:43,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1142970.0, ans=0.125 2024-08-11 14:29:56,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-08-11 14:30:00,036 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-11 14:30:03,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-11 14:30:05,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1143170.0, ans=0.04949747468305833 2024-08-11 14:30:17,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1143270.0, ans=0.125 2024-08-11 14:30:25,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1143270.0, ans=10.0 2024-08-11 14:30:28,879 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 14:30:39,745 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 14:30:40,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12900, loss[loss=0.111, beats_loss=0.0113, ecapa_loss=0.0001774, whisper_loss=0.09792, over 23544.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01159, ecapa_loss=0.0001973, whisper_loss=0.0916, over 3860835.08 frames. ], batch size: 91, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:30:42,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1143470.0, ans=0.07 2024-08-11 14:31:05,278 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 14:31:12,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1143670.0, ans=0.0 2024-08-11 14:31:18,820 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.710e+05 2024-08-11 14:31:21,089 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 14:31:24,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1143770.0, ans=0.0 2024-08-11 14:31:25,290 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 14:31:29,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1143770.0, ans=0.125 2024-08-11 14:31:48,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 12950, loss[loss=0.09451, beats_loss=0.01222, ecapa_loss=0.0001946, whisper_loss=0.08034, over 14032.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01146, ecapa_loss=0.000198, whisper_loss=0.09193, over 3835010.90 frames. ], batch size: 56, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:31:49,805 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-11 14:31:49,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1143970.0, ans=0.125 2024-08-11 14:31:54,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.619e+01 2.896e+01 3.261e+01 4.562e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 14:31:58,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-08-11 14:32:04,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1144070.0, ans=0.125 2024-08-11 14:32:21,472 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 14:32:54,043 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 14:32:55,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13000, loss[loss=0.09207, beats_loss=0.01075, ecapa_loss=0.0001978, whisper_loss=0.07934, over 18002.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01148, ecapa_loss=0.0001984, whisper_loss=0.09174, over 3839754.35 frames. ], batch size: 71, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:33:01,800 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 14:33:06,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1144470.0, ans=0.125 2024-08-11 14:33:18,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-08-11 14:33:19,352 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 14:33:23,408 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 14:33:35,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-08-11 14:33:42,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-08-11 14:33:45,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1144770.0, ans=0.125 2024-08-11 14:34:01,624 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13050, loss[loss=0.09649, beats_loss=0.01232, ecapa_loss=0.0001251, whisper_loss=0.08291, over 15318.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0114, ecapa_loss=0.0001992, whisper_loss=0.09205, over 3847941.22 frames. ], batch size: 54, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:34:09,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.663e+01 3.009e+01 3.543e+01 5.736e+01, threshold=6.018e+01, percent-clipped=0.0 2024-08-11 14:34:13,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1145070.0, ans=0.1 2024-08-11 14:34:27,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.04 vs. limit=15.0 2024-08-11 14:34:30,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1145170.0, ans=0.125 2024-08-11 14:35:08,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13100, loss[loss=0.1058, beats_loss=0.01371, ecapa_loss=0.0002008, whisper_loss=0.09008, over 22034.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01144, ecapa_loss=0.0002001, whisper_loss=0.09216, over 3887779.93 frames. ], batch size: 89, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:35:24,649 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 14:35:26,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1145570.0, ans=0.1 2024-08-11 14:35:34,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1145670.0, ans=0.2 2024-08-11 14:35:57,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1145770.0, ans=0.125 2024-08-11 14:36:09,537 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 14:36:12,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1145870.0, ans=0.0 2024-08-11 14:36:15,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1145970.0, ans=0.05 2024-08-11 14:36:16,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13150, loss[loss=0.1193, beats_loss=0.00854, ecapa_loss=0.0002211, whisper_loss=0.1086, over 20941.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01146, ecapa_loss=0.0001989, whisper_loss=0.09233, over 3907531.40 frames. ], batch size: 81, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:36:21,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1145970.0, ans=0.125 2024-08-11 14:36:21,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1145970.0, ans=0.125 2024-08-11 14:36:24,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.646e+01 3.074e+01 3.551e+01 7.415e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-11 14:36:26,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1145970.0, ans=0.05 2024-08-11 14:36:27,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1145970.0, ans=0.0 2024-08-11 14:36:38,084 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 14:36:39,457 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 14:36:43,748 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 14:36:49,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-11 14:36:58,913 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 14:37:00,185 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 14:37:20,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1146370.0, ans=0.1 2024-08-11 14:37:25,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13200, loss[loss=0.1269, beats_loss=0.01189, ecapa_loss=0.0001834, whisper_loss=0.1132, over 22739.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01141, ecapa_loss=0.0001983, whisper_loss=0.09266, over 3883366.96 frames. ], batch size: 90, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:37:31,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1146470.0, ans=0.125 2024-08-11 14:37:39,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2024-08-11 14:37:43,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1146570.0, ans=0.0 2024-08-11 14:37:59,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1146670.0, ans=0.125 2024-08-11 14:38:06,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-08-11 14:38:14,590 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 14:38:28,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1146870.0, ans=0.125 2024-08-11 14:38:31,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13250, loss[loss=0.08992, beats_loss=0.01052, ecapa_loss=0.0002733, whisper_loss=0.07667, over 20637.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0002, whisper_loss=0.09277, over 3899598.64 frames. ], batch size: 90, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:38:39,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1146970.0, ans=0.0 2024-08-11 14:38:39,875 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.719e+01 3.002e+01 3.497e+01 5.724e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 14:38:51,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.03 vs. limit=10.0 2024-08-11 14:38:54,475 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 14:38:58,468 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 14:38:58,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1147170.0, ans=0.2 2024-08-11 14:39:08,112 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 13 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 14:39:13,546 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 14:39:38,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13300, loss[loss=0.1214, beats_loss=0.009449, ecapa_loss=0.0002293, whisper_loss=0.1096, over 15085.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01136, ecapa_loss=0.0002007, whisper_loss=0.09279, over 3853943.69 frames. ], batch size: 61, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:39:43,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1147470.0, ans=0.125 2024-08-11 14:39:47,044 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 14:40:08,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1147670.0, ans=0.0 2024-08-11 14:40:15,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1147670.0, ans=0.125 2024-08-11 14:40:24,899 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 14:40:36,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1147870.0, ans=0.125 2024-08-11 14:40:36,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1147870.0, ans=0.125 2024-08-11 14:40:44,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13350, loss[loss=0.1085, beats_loss=0.008906, ecapa_loss=0.0002658, whisper_loss=0.09695, over 14157.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01136, ecapa_loss=0.0002014, whisper_loss=0.09281, over 3873653.26 frames. ], batch size: 58, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:40:52,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1147970.0, ans=0.2 2024-08-11 14:40:53,100 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.881e+01 3.191e+01 3.673e+01 5.435e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 14:40:53,848 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.849e-02 2024-08-11 14:40:57,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1148070.0, ans=0.1 2024-08-11 14:41:14,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1148170.0, ans=0.1 2024-08-11 14:41:29,664 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 14:41:30,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1148270.0, ans=0.125 2024-08-11 14:41:46,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1148370.0, ans=0.125 2024-08-11 14:41:47,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-08-11 14:41:52,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13400, loss[loss=0.08149, beats_loss=0.01208, ecapa_loss=0.000214, whisper_loss=0.06727, over 14862.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01124, ecapa_loss=0.0002019, whisper_loss=0.09362, over 3864497.49 frames. ], batch size: 64, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:42:03,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1148470.0, ans=0.95 2024-08-11 14:42:10,239 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.616e+05 2024-08-11 14:42:11,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1148570.0, ans=0.125 2024-08-11 14:42:27,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-08-11 14:42:29,822 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 14:42:39,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1148770.0, ans=0.125 2024-08-11 14:42:59,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13450, loss[loss=0.1108, beats_loss=0.009559, ecapa_loss=0.0002169, whisper_loss=0.09908, over 21663.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01131, ecapa_loss=0.0002, whisper_loss=0.09284, over 3874771.58 frames. ], batch size: 91, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:43:03,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1148970.0, ans=0.125 2024-08-11 14:43:06,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.672e+01 2.998e+01 3.496e+01 5.811e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 14:43:12,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1149070.0, ans=0.035 2024-08-11 14:43:32,607 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 14:43:40,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2024-08-11 14:43:49,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1149270.0, ans=0.125 2024-08-11 14:44:03,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-11 14:44:06,858 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13500, loss[loss=0.1074, beats_loss=0.01011, ecapa_loss=0.0001629, whisper_loss=0.09568, over 15276.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01122, ecapa_loss=0.0001995, whisper_loss=0.0938, over 3917162.38 frames. ], batch size: 59, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:44:09,559 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 14:44:11,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1149470.0, ans=0.2 2024-08-11 14:44:12,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.74 vs. limit=22.5 2024-08-11 14:44:28,829 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 14:44:31,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1149570.0, ans=0.0 2024-08-11 14:44:37,947 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 14:45:04,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1149870.0, ans=0.2 2024-08-11 14:45:13,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13550, loss[loss=0.1061, beats_loss=0.01114, ecapa_loss=0.0001602, whisper_loss=0.09337, over 15608.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01119, ecapa_loss=0.0001995, whisper_loss=0.09401, over 3881012.83 frames. ], batch size: 59, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:45:22,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.724e+01 3.026e+01 3.356e+01 6.368e+01, threshold=6.052e+01, percent-clipped=1.0 2024-08-11 14:45:23,456 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 14:45:23,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1149970.0, ans=0.07 2024-08-11 14:45:56,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1150270.0, ans=0.125 2024-08-11 14:46:14,082 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 14:46:20,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13600, loss[loss=0.08904, beats_loss=0.01271, ecapa_loss=0.000204, whisper_loss=0.07429, over 21395.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0112, ecapa_loss=0.0002001, whisper_loss=0.09388, over 3894973.23 frames. ], batch size: 88, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:46:36,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.13 vs. limit=10.0 2024-08-11 14:46:48,929 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 14:46:52,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2024-08-11 14:47:18,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1150870.0, ans=0.0 2024-08-11 14:47:27,221 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13650, loss[loss=0.1148, beats_loss=0.01187, ecapa_loss=0.0001786, whisper_loss=0.1011, over 20781.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0002019, whisper_loss=0.09407, over 3918688.36 frames. ], batch size: 81, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:47:30,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2024-08-11 14:47:34,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.952e+01 3.395e+01 3.813e+01 5.359e+01, threshold=6.790e+01, percent-clipped=0.0 2024-08-11 14:47:44,686 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.193e-02 2024-08-11 14:48:00,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-08-11 14:48:04,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1151170.0, ans=0.0 2024-08-11 14:48:09,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1151270.0, ans=0.125 2024-08-11 14:48:11,638 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 14:48:14,241 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 14:48:19,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1151270.0, ans=0.125 2024-08-11 14:48:26,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1151370.0, ans=0.125 2024-08-11 14:48:30,450 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 14:48:34,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13700, loss[loss=0.09301, beats_loss=0.01235, ecapa_loss=0.0001964, whisper_loss=0.0787, over 18192.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01127, ecapa_loss=0.0002009, whisper_loss=0.09387, over 3931686.09 frames. ], batch size: 75, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:48:40,203 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 14:48:57,282 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 14:49:02,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2024-08-11 14:49:06,411 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 14:49:17,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-08-11 14:49:24,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1151770.0, ans=0.05 2024-08-11 14:49:24,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1151770.0, ans=0.1 2024-08-11 14:49:41,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13750, loss[loss=0.1315, beats_loss=0.01068, ecapa_loss=0.0001949, whisper_loss=0.1189, over 23799.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01126, ecapa_loss=0.0002004, whisper_loss=0.09369, over 3886390.61 frames. ], batch size: 92, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:49:47,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1151970.0, ans=0.125 2024-08-11 14:49:49,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.564e+01 2.884e+01 3.394e+01 1.263e+02, threshold=5.769e+01, percent-clipped=1.0 2024-08-11 14:49:53,806 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 14:50:09,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1152170.0, ans=0.125 2024-08-11 14:50:09,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-08-11 14:50:14,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1152170.0, ans=0.2 2024-08-11 14:50:16,506 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 35 from Vox, 28 fro AS 2024-08-11 14:50:26,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-08-11 14:50:43,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1152370.0, ans=0.0 2024-08-11 14:50:44,812 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 14:50:48,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13800, loss[loss=0.08662, beats_loss=0.01248, ecapa_loss=0.0001824, whisper_loss=0.07231, over 22469.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01132, ecapa_loss=0.000199, whisper_loss=0.09316, over 3899950.93 frames. ], batch size: 90, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:50:49,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1152470.0, ans=0.0 2024-08-11 14:50:53,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1152470.0, ans=0.025 2024-08-11 14:51:02,038 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 14:51:09,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1152570.0, ans=0.125 2024-08-11 14:51:10,193 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 14:51:27,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1152770.0, ans=0.0 2024-08-11 14:51:28,964 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 14:51:31,510 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 14:51:33,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1152770.0, ans=0.125 2024-08-11 14:51:33,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1152770.0, ans=0.125 2024-08-11 14:51:55,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13850, loss[loss=0.1204, beats_loss=0.01061, ecapa_loss=0.0001493, whisper_loss=0.1083, over 22086.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01123, ecapa_loss=0.0001987, whisper_loss=0.0937, over 3917067.81 frames. ], batch size: 84, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:51:55,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1152970.0, ans=0.125 2024-08-11 14:51:58,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1152970.0, ans=0.125 2024-08-11 14:51:59,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1152970.0, ans=0.125 2024-08-11 14:52:02,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1152970.0, ans=0.125 2024-08-11 14:52:03,067 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.659e+01 3.124e+01 3.574e+01 6.862e+01, threshold=6.248e+01, percent-clipped=1.0 2024-08-11 14:52:15,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2024-08-11 14:52:43,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1153270.0, ans=0.2 2024-08-11 14:52:45,701 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 14:52:47,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1153370.0, ans=0.2 2024-08-11 14:52:53,649 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 17 from LS+wenet, 24 from Vox, 52 fro AS 2024-08-11 14:53:01,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13900, loss[loss=0.1206, beats_loss=0.008559, ecapa_loss=0.0002515, whisper_loss=0.1095, over 17141.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01117, ecapa_loss=0.0001983, whisper_loss=0.09426, over 3911143.09 frames. ], batch size: 71, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:53:10,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2024-08-11 14:53:14,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-08-11 14:53:19,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1153570.0, ans=0.07 2024-08-11 14:53:21,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1153570.0, ans=0.125 2024-08-11 14:53:24,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1153570.0, ans=0.125 2024-08-11 14:53:28,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1153670.0, ans=0.1 2024-08-11 14:53:41,501 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 14:53:49,447 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 14:54:01,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1153870.0, ans=0.0 2024-08-11 14:54:05,044 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 14:54:07,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 13950, loss[loss=0.1168, beats_loss=0.01182, ecapa_loss=0.0001491, whisper_loss=0.1035, over 16676.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01122, ecapa_loss=0.0001973, whisper_loss=0.09425, over 3893929.34 frames. ], batch size: 62, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:54:15,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.781e+01 3.096e+01 3.577e+01 5.485e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 14:54:19,903 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-11 14:54:24,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1154070.0, ans=0.0 2024-08-11 14:54:25,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1154070.0, ans=0.1 2024-08-11 14:54:30,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2024-08-11 14:54:37,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1154170.0, ans=0.125 2024-08-11 14:54:42,611 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 14:54:48,516 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 14:54:49,850 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 14:54:54,355 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 14:54:54,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=1154270.0, ans=0.2 2024-08-11 14:54:57,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1154270.0, ans=0.035 2024-08-11 14:54:59,677 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 14:55:16,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14000, loss[loss=0.117, beats_loss=0.008363, ecapa_loss=0.0001751, whisper_loss=0.1069, over 16712.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01131, ecapa_loss=0.0001949, whisper_loss=0.09395, over 3884124.29 frames. ], batch size: 63, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:55:18,027 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 14:55:20,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1154470.0, ans=0.0 2024-08-11 14:55:45,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1154670.0, ans=0.2 2024-08-11 14:55:54,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1154670.0, ans=0.0 2024-08-11 14:55:57,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1154770.0, ans=0.0 2024-08-11 14:55:57,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1154770.0, ans=0.2 2024-08-11 14:56:03,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1154770.0, ans=0.05 2024-08-11 14:56:04,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1154770.0, ans=0.0 2024-08-11 14:56:09,025 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 14:56:27,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14050, loss[loss=0.1083, beats_loss=0.01066, ecapa_loss=0.0002381, whisper_loss=0.09523, over 14627.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01128, ecapa_loss=0.0001958, whisper_loss=0.0935, over 3879512.56 frames. ], batch size: 60, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:56:36,822 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.747e+01 3.034e+01 3.556e+01 6.486e+01, threshold=6.067e+01, percent-clipped=1.0 2024-08-11 14:56:38,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1154970.0, ans=0.0 2024-08-11 14:56:39,459 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:56:45,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=15.0 2024-08-11 14:56:59,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1155170.0, ans=0.5 2024-08-11 14:57:03,306 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 14:57:09,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1155170.0, ans=0.07 2024-08-11 14:57:17,762 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 14:57:31,336 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 14:57:34,146 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 14:57:40,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1155370.0, ans=0.125 2024-08-11 14:57:40,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1155370.0, ans=0.1 2024-08-11 14:57:43,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14100, loss[loss=0.1141, beats_loss=0.01218, ecapa_loss=0.0001998, whisper_loss=0.09987, over 23147.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0001961, whisper_loss=0.09324, over 3851236.14 frames. ], batch size: 94, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:58:00,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1155570.0, ans=0.125 2024-08-11 14:58:05,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1155570.0, ans=0.125 2024-08-11 14:58:14,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1155670.0, ans=0.0 2024-08-11 14:58:17,998 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 14:58:26,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-08-11 14:58:31,038 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 14:58:40,129 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 14:58:42,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1155770.0, ans=0.2 2024-08-11 14:58:43,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.43 vs. limit=22.5 2024-08-11 14:58:44,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1155870.0, ans=0.2 2024-08-11 14:58:55,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1155870.0, ans=15.0 2024-08-11 14:58:59,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14150, loss[loss=0.1239, beats_loss=0.0131, ecapa_loss=0.0001595, whisper_loss=0.1092, over 24074.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01139, ecapa_loss=0.0001955, whisper_loss=0.09348, over 3869790.35 frames. ], batch size: 88, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:59:08,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.682e+01 3.045e+01 3.525e+01 6.405e+01, threshold=6.090e+01, percent-clipped=1.0 2024-08-11 14:59:18,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1156070.0, ans=0.125 2024-08-11 14:59:21,856 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:59:28,888 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-11 14:59:38,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1156170.0, ans=0.125 2024-08-11 14:59:42,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1156170.0, ans=0.125 2024-08-11 14:59:44,040 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 15:00:17,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14200, loss[loss=0.1237, beats_loss=0.008279, ecapa_loss=0.0002347, whisper_loss=0.1131, over 14993.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001949, whisper_loss=0.09322, over 3876288.42 frames. ], batch size: 59, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:00:18,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2024-08-11 15:00:23,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1156470.0, ans=0.125 2024-08-11 15:00:25,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1156470.0, ans=0.125 2024-08-11 15:00:39,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1156570.0, ans=0.125 2024-08-11 15:00:40,247 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 15:00:45,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.57 vs. limit=10.0 2024-08-11 15:00:46,525 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 15:00:49,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1156670.0, ans=0.07 2024-08-11 15:01:05,367 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 15:01:06,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-08-11 15:01:21,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1156870.0, ans=15.0 2024-08-11 15:01:27,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1156870.0, ans=0.1 2024-08-11 15:01:32,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14250, loss[loss=0.1257, beats_loss=0.01185, ecapa_loss=0.0001465, whisper_loss=0.1124, over 25066.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0001949, whisper_loss=0.09364, over 3891007.67 frames. ], batch size: 93, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:01:39,988 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 15:01:43,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.820e+01 3.214e+01 3.813e+01 8.671e+01, threshold=6.428e+01, percent-clipped=3.0 2024-08-11 15:01:47,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1156970.0, ans=0.09899494936611666 2024-08-11 15:01:59,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1157070.0, ans=0.0 2024-08-11 15:02:09,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1157170.0, ans=0.07 2024-08-11 15:02:12,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-11 15:02:34,824 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 15:02:44,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1157370.0, ans=0.0 2024-08-11 15:02:46,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157370.0, ans=0.1 2024-08-11 15:02:47,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-08-11 15:02:52,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14300, loss[loss=0.09764, beats_loss=0.01022, ecapa_loss=0.0002, whisper_loss=0.08542, over 13357.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01136, ecapa_loss=0.000196, whisper_loss=0.09342, over 3913177.69 frames. ], batch size: 53, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:02:54,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1157470.0, ans=0.2 2024-08-11 15:02:58,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=12.0 2024-08-11 15:03:06,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-11 15:03:06,815 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 15:03:15,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1157570.0, ans=0.0 2024-08-11 15:03:26,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1157670.0, ans=0.1 2024-08-11 15:03:31,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1157670.0, ans=0.0 2024-08-11 15:03:38,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-11 15:03:45,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1157770.0, ans=0.0 2024-08-11 15:03:50,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.28 vs. limit=10.0 2024-08-11 15:04:01,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-11 15:04:07,915 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14350, loss[loss=0.103, beats_loss=0.01206, ecapa_loss=0.0002132, whisper_loss=0.08877, over 17261.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001958, whisper_loss=0.09315, over 3883826.40 frames. ], batch size: 71, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:04:14,237 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 15:04:16,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.896e+01 3.266e+01 3.801e+01 1.000e+02, threshold=6.532e+01, percent-clipped=2.0 2024-08-11 15:04:26,020 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 15:04:26,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1158070.0, ans=0.2 2024-08-11 15:04:31,697 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 15:04:40,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1158170.0, ans=0.0 2024-08-11 15:04:42,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1158170.0, ans=0.0 2024-08-11 15:04:51,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1158270.0, ans=0.1 2024-08-11 15:04:54,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1158270.0, ans=0.05 2024-08-11 15:05:10,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.16 vs. limit=15.0 2024-08-11 15:05:13,319 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 15:05:20,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2024-08-11 15:05:21,163 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 15:05:23,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14400, loss[loss=0.1044, beats_loss=0.009018, ecapa_loss=0.0001916, whisper_loss=0.09342, over 15147.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01137, ecapa_loss=0.0001973, whisper_loss=0.09301, over 3893467.69 frames. ], batch size: 59, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:05:38,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1158570.0, ans=0.125 2024-08-11 15:05:47,233 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 15:05:50,291 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 15:05:54,724 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 15:06:00,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=12.0 2024-08-11 15:06:11,718 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 15:06:32,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-08-11 15:06:39,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 8, batch 14450, loss[loss=0.08313, beats_loss=0.0136, ecapa_loss=0.0001993, whisper_loss=0.06754, over 13076.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01143, ecapa_loss=0.0001986, whisper_loss=0.09243, over 3837438.66 frames. ], batch size: 53, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:06:48,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.733e+01 3.088e+01 3.504e+01 7.570e+01, threshold=6.176e+01, percent-clipped=1.0 2024-08-11 15:06:52,111 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 15:07:04,079 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 15:07:16,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1159170.0, ans=0.2 2024-08-11 15:07:29,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1159270.0, ans=0.125 2024-08-11 15:07:31,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-11 15:08:19,711 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 0, loss[loss=0.09512, beats_loss=0.0113, ecapa_loss=0.0001608, whisper_loss=0.08222, over 15975.00 frames. ], tot_loss[loss=0.09512, beats_loss=0.0113, ecapa_loss=0.0001608, whisper_loss=0.08222, over 15975.00 frames. ], batch size: 58, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:08:19,712 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 15:08:56,610 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006493, whisper_loss=0.2513, over 922467.00 frames. 2024-08-11 15:09:15,587 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on SV_voxceleb1: loss=0.005328, beats_loss=0, ecapa_loss=0.0005328, whisper_loss=0, over 939242.00 frames. 2024-08-11 15:11:18,953 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on AT_audioset: loss=0.0249, beats_loss=0.0249, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 15:11:18,961 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 15:11:55,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-11 15:11:56,338 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 15:12:29,746 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 15:13:43,371 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 15:14:07,960 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 15:14:10,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1159780.0, ans=0.2 2024-08-11 15:14:32,805 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 50, loss[loss=0.1108, beats_loss=0.009323, ecapa_loss=0.0002389, whisper_loss=0.09909, over 21721.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01102, ecapa_loss=0.0002144, whisper_loss=0.08809, over 862671.01 frames. ], batch size: 88, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:14:52,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1159880.0, ans=0.0 2024-08-11 15:15:14,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-11 15:15:20,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1159880.0, ans=0.2 2024-08-11 15:15:26,743 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 15:15:51,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1159980.0, ans=0.125 2024-08-11 15:15:52,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.906e+01 3.207e+01 3.715e+01 5.089e+01, threshold=6.415e+01, percent-clipped=0.0 2024-08-11 15:16:16,428 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 15:16:49,274 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-11 15:17:37,411 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 15:18:39,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1160280.0, ans=0.2 2024-08-11 15:18:56,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=12.0 2024-08-11 15:19:03,936 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 100, loss[loss=0.1244, beats_loss=0.007697, ecapa_loss=0.0002294, whisper_loss=0.1144, over 22199.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01059, ecapa_loss=0.0002081, whisper_loss=0.09232, over 1510420.16 frames. ], batch size: 87, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:19:24,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1160380.0, ans=0.125 2024-08-11 15:19:24,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=15.0 2024-08-11 15:20:01,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1160480.0, ans=0.125 2024-08-11 15:20:12,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1160480.0, ans=0.125 2024-08-11 15:20:41,050 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 15:20:54,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1160580.0, ans=0.125 2024-08-11 15:21:22,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1160680.0, ans=0.125 2024-08-11 15:21:48,663 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 15:21:50,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1160780.0, ans=0.1 2024-08-11 15:22:03,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 150, loss[loss=0.1236, beats_loss=0.007514, ecapa_loss=0.000163, whisper_loss=0.1144, over 19582.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01071, ecapa_loss=0.0002037, whisper_loss=0.0926, over 2012217.67 frames. ], batch size: 70, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:22:09,025 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 15:22:19,777 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 15:22:34,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1160980.0, ans=0.1 2024-08-11 15:22:43,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.986e+01 3.190e+01 3.682e+01 6.515e+01, threshold=6.380e+01, percent-clipped=1.0 2024-08-11 15:23:03,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-08-11 15:23:08,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1161080.0, ans=0.125 2024-08-11 15:23:20,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.26 vs. limit=22.5 2024-08-11 15:23:34,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1161180.0, ans=6.0 2024-08-11 15:23:39,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1161280.0, ans=0.5 2024-08-11 15:24:02,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 200, loss[loss=0.1059, beats_loss=0.01159, ecapa_loss=0.0002244, whisper_loss=0.09204, over 16453.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01093, ecapa_loss=0.0001999, whisper_loss=0.09189, over 2412914.83 frames. ], batch size: 66, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:24:04,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1161380.0, ans=0.125 2024-08-11 15:24:08,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1161380.0, ans=0.125 2024-08-11 15:24:13,446 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 15:24:20,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1161480.0, ans=0.125 2024-08-11 15:24:43,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1161580.0, ans=0.125 2024-08-11 15:24:48,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1161580.0, ans=0.125 2024-08-11 15:25:01,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1161680.0, ans=0.125 2024-08-11 15:25:19,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1161780.0, ans=0.125 2024-08-11 15:25:23,196 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 15:25:29,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.03 vs. limit=22.5 2024-08-11 15:25:30,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1161780.0, ans=0.1 2024-08-11 15:25:35,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 250, loss[loss=0.1116, beats_loss=0.01091, ecapa_loss=0.0001637, whisper_loss=0.09903, over 19601.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01103, ecapa_loss=0.0001965, whisper_loss=0.09281, over 2775284.85 frames. ], batch size: 77, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:26:05,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.655e+01 2.964e+01 3.308e+01 4.229e+01, threshold=5.928e+01, percent-clipped=0.0 2024-08-11 15:26:58,468 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 15:27:01,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1162280.0, ans=0.0 2024-08-11 15:27:06,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1162280.0, ans=0.2 2024-08-11 15:27:07,821 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 15:27:08,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1162280.0, ans=0.2 2024-08-11 15:27:21,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 300, loss[loss=0.1095, beats_loss=0.01295, ecapa_loss=0.0001875, whisper_loss=0.09469, over 22804.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.000195, whisper_loss=0.09229, over 2981269.38 frames. ], batch size: 92, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:27:43,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1162480.0, ans=0.125 2024-08-11 15:27:44,165 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 15:27:48,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1162480.0, ans=0.2 2024-08-11 15:27:50,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1162480.0, ans=0.125 2024-08-11 15:27:58,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1162580.0, ans=0.125 2024-08-11 15:28:39,419 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 350, loss[loss=0.111, beats_loss=0.01157, ecapa_loss=0.0001902, whisper_loss=0.0975, over 16038.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001951, whisper_loss=0.09178, over 3159445.57 frames. ], batch size: 64, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:28:42,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1162880.0, ans=0.2 2024-08-11 15:28:43,939 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-11 15:28:53,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1162980.0, ans=0.125 2024-08-11 15:28:55,483 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 15:29:00,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.571e+01 3.026e+01 3.460e+01 5.079e+01, threshold=6.051e+01, percent-clipped=0.0 2024-08-11 15:29:11,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-08-11 15:29:11,960 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 15:29:30,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1163180.0, ans=0.125 2024-08-11 15:29:50,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 400, loss[loss=0.1061, beats_loss=0.009565, ecapa_loss=0.0001767, whisper_loss=0.09476, over 18294.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01103, ecapa_loss=0.0001946, whisper_loss=0.09151, over 3312999.87 frames. ], batch size: 66, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:29:57,899 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 15:30:28,905 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 15:30:30,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1163580.0, ans=0.0 2024-08-11 15:30:33,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1163680.0, ans=0.1 2024-08-11 15:31:01,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 450, loss[loss=0.09555, beats_loss=0.01046, ecapa_loss=0.0002068, whisper_loss=0.08303, over 19062.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001935, whisper_loss=0.09217, over 3431273.49 frames. ], batch size: 75, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:31:15,612 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 15:31:22,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.636e+01 2.915e+01 3.353e+01 5.482e+01, threshold=5.829e+01, percent-clipped=0.0 2024-08-11 15:31:33,569 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 15:31:35,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1164080.0, ans=0.125 2024-08-11 15:31:53,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1164180.0, ans=0.125 2024-08-11 15:32:01,330 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 15:32:12,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1164280.0, ans=0.1 2024-08-11 15:32:14,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 500, loss[loss=0.0884, beats_loss=0.01151, ecapa_loss=0.0001767, whisper_loss=0.07513, over 15725.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01097, ecapa_loss=0.0001924, whisper_loss=0.09202, over 3499136.39 frames. ], batch size: 63, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:32:15,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1164380.0, ans=0.0 2024-08-11 15:32:16,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1164380.0, ans=0.1 2024-08-11 15:32:21,953 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 18 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-11 15:32:45,163 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 15:32:47,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=15.0 2024-08-11 15:32:48,118 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 15:32:50,843 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 15:33:02,107 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 15:33:03,333 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 15:33:10,282 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 40 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 15:33:26,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 550, loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0002185, whisper_loss=0.09208, over 19122.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001931, whisper_loss=0.09233, over 3576823.79 frames. ], batch size: 76, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:33:29,734 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 15:33:31,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1164880.0, ans=0.125 2024-08-11 15:33:42,492 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-11 15:33:48,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.637e+01 3.008e+01 3.365e+01 4.595e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 15:33:53,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-11 15:33:54,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1165080.0, ans=0.125 2024-08-11 15:34:04,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1165080.0, ans=0.0 2024-08-11 15:34:06,923 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 15:34:16,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1165180.0, ans=0.125 2024-08-11 15:34:21,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1165180.0, ans=0.0 2024-08-11 15:34:24,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1165280.0, ans=0.0 2024-08-11 15:34:38,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 600, loss[loss=0.1004, beats_loss=0.01135, ecapa_loss=0.0001817, whisper_loss=0.08719, over 17344.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01102, ecapa_loss=0.0001916, whisper_loss=0.09231, over 3653332.82 frames. ], batch size: 69, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:34:59,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.09 vs. limit=15.0 2024-08-11 15:35:36,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1165780.0, ans=0.125 2024-08-11 15:35:53,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 650, loss[loss=0.1078, beats_loss=0.01313, ecapa_loss=0.0001389, whisper_loss=0.09328, over 17718.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01105, ecapa_loss=0.0001909, whisper_loss=0.09171, over 3673744.00 frames. ], batch size: 66, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:36:03,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1165880.0, ans=0.1 2024-08-11 15:36:16,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.692e+01 3.015e+01 3.566e+01 6.762e+01, threshold=6.030e+01, percent-clipped=2.0 2024-08-11 15:36:16,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1165980.0, ans=0.125 2024-08-11 15:36:40,182 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 15:36:45,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1166180.0, ans=0.0 2024-08-11 15:37:00,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1166280.0, ans=0.2 2024-08-11 15:37:05,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1166280.0, ans=0.0 2024-08-11 15:37:13,553 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 700, loss[loss=0.08676, beats_loss=0.01213, ecapa_loss=0.0002288, whisper_loss=0.07235, over 16728.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01108, ecapa_loss=0.0001914, whisper_loss=0.09201, over 3724406.35 frames. ], batch size: 71, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:37:19,599 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 15:37:22,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1166380.0, ans=0.0 2024-08-11 15:37:30,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-11 15:37:32,579 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 15:37:35,340 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 15:37:35,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1166480.0, ans=0.125 2024-08-11 15:37:41,118 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 15:37:48,061 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 15:37:50,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1166580.0, ans=10.0 2024-08-11 15:37:56,474 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-11 15:38:32,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1166780.0, ans=0.125 2024-08-11 15:38:33,387 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 15:38:35,757 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 750, loss[loss=0.08734, beats_loss=0.01199, ecapa_loss=0.0002268, whisper_loss=0.07308, over 16347.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01103, ecapa_loss=0.0001897, whisper_loss=0.09242, over 3739374.65 frames. ], batch size: 68, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:39:00,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.889e+01 3.485e+01 5.934e+01, threshold=5.777e+01, percent-clipped=0.0 2024-08-11 15:39:01,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1166980.0, ans=0.0 2024-08-11 15:39:11,992 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 15:39:16,676 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 15:39:23,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1167080.0, ans=0.125 2024-08-11 15:39:23,916 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 15:39:27,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1167180.0, ans=0.07 2024-08-11 15:39:59,052 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 15:40:00,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 800, loss[loss=0.09609, beats_loss=0.01163, ecapa_loss=0.0001941, whisper_loss=0.08252, over 22289.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01114, ecapa_loss=0.0001904, whisper_loss=0.09195, over 3770650.92 frames. ], batch size: 89, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:40:05,782 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-11 15:40:12,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1167380.0, ans=0.1 2024-08-11 15:40:29,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1167480.0, ans=0.07 2024-08-11 15:40:29,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.73 vs. limit=22.5 2024-08-11 15:40:44,354 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 15:40:54,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1167680.0, ans=0.125 2024-08-11 15:41:14,965 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 15:41:25,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 850, loss[loss=0.1093, beats_loss=0.01082, ecapa_loss=0.0001923, whisper_loss=0.09655, over 15166.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01112, ecapa_loss=0.0001906, whisper_loss=0.09156, over 3765615.89 frames. ], batch size: 62, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:41:38,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2024-08-11 15:41:52,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2024-08-11 15:41:52,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.648e+01 3.009e+01 3.325e+01 6.049e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-11 15:42:00,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2024-08-11 15:42:13,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1168080.0, ans=0.0 2024-08-11 15:42:15,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1168180.0, ans=0.125 2024-08-11 15:42:16,457 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 15:42:35,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1168280.0, ans=0.2 2024-08-11 15:42:50,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 900, loss[loss=0.1235, beats_loss=0.01155, ecapa_loss=0.0001952, whisper_loss=0.11, over 20477.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.000189, whisper_loss=0.09151, over 3804969.15 frames. ], batch size: 81, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:43:25,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1168580.0, ans=0.0 2024-08-11 15:43:34,860 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 15:43:41,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1168680.0, ans=0.125 2024-08-11 15:43:54,433 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 15:43:56,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-08-11 15:43:59,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-08-11 15:44:15,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 950, loss[loss=0.108, beats_loss=0.01192, ecapa_loss=0.0001676, whisper_loss=0.09443, over 23541.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001901, whisper_loss=0.09138, over 3811298.96 frames. ], batch size: 89, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:44:20,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1168880.0, ans=0.09899494936611666 2024-08-11 15:44:42,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.663e+01 2.966e+01 3.403e+01 1.009e+02, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 15:44:52,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1169080.0, ans=0.09899494936611666 2024-08-11 15:44:53,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1169080.0, ans=0.125 2024-08-11 15:45:10,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.22 vs. limit=10.0 2024-08-11 15:45:36,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1169380.0, ans=0.0 2024-08-11 15:45:37,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1000, loss[loss=0.1092, beats_loss=0.01297, ecapa_loss=0.0002044, whisper_loss=0.09422, over 18043.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01117, ecapa_loss=0.0001883, whisper_loss=0.09123, over 3812510.18 frames. ], batch size: 75, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:45:40,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1169380.0, ans=0.125 2024-08-11 15:45:44,646 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 15:45:56,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1169480.0, ans=0.0 2024-08-11 15:46:05,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1169480.0, ans=0.2 2024-08-11 15:46:06,938 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 15:46:09,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1169580.0, ans=0.2 2024-08-11 15:46:29,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1169680.0, ans=0.0 2024-08-11 15:46:42,360 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 37 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 15:46:46,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1169780.0, ans=0.2 2024-08-11 15:46:51,252 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 15:47:00,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1050, loss[loss=0.112, beats_loss=0.01106, ecapa_loss=0.0001896, whisper_loss=0.09907, over 23568.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.0001888, whisper_loss=0.09151, over 3837489.00 frames. ], batch size: 93, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:47:07,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2024-08-11 15:47:15,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1169880.0, ans=0.125 2024-08-11 15:47:29,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.579e+01 2.847e+01 3.241e+01 6.261e+01, threshold=5.695e+01, percent-clipped=1.0 2024-08-11 15:47:30,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1169980.0, ans=0.125 2024-08-11 15:47:55,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1170180.0, ans=0.2 2024-08-11 15:48:02,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1170180.0, ans=0.125 2024-08-11 15:48:09,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1170180.0, ans=0.0 2024-08-11 15:48:10,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1170180.0, ans=0.125 2024-08-11 15:48:21,990 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 15:48:32,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1100, loss[loss=0.1203, beats_loss=0.01138, ecapa_loss=0.0001813, whisper_loss=0.1071, over 20491.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01114, ecapa_loss=0.0001874, whisper_loss=0.09214, over 3846523.20 frames. ], batch size: 77, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:48:51,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1170480.0, ans=0.125 2024-08-11 15:49:16,609 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 15:49:20,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2024-08-11 15:49:27,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-11 15:49:34,628 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 15:49:37,763 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 15:49:58,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-08-11 15:49:58,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1150, loss[loss=0.1322, beats_loss=0.009976, ecapa_loss=0.0001464, whisper_loss=0.1208, over 25305.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.000188, whisper_loss=0.0924, over 3868741.70 frames. ], batch size: 92, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:50:00,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1170880.0, ans=0.125 2024-08-11 15:50:03,790 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-11 15:50:17,823 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-11 15:50:25,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.574e+01 2.982e+01 3.415e+01 5.178e+01, threshold=5.965e+01, percent-clipped=0.0 2024-08-11 15:50:28,872 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 15:50:51,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1171180.0, ans=0.125 2024-08-11 15:50:52,400 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 15:50:54,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2024-08-11 15:51:03,210 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 15:51:03,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1171280.0, ans=0.0 2024-08-11 15:51:15,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1171280.0, ans=0.0 2024-08-11 15:51:20,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1200, loss[loss=0.08457, beats_loss=0.01451, ecapa_loss=0.0001714, whisper_loss=0.06835, over 15494.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01119, ecapa_loss=0.0001894, whisper_loss=0.09152, over 3865714.58 frames. ], batch size: 61, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:51:30,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1171380.0, ans=0.125 2024-08-11 15:52:28,931 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 15:52:31,995 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 15:52:42,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1250, loss[loss=0.104, beats_loss=0.01222, ecapa_loss=0.000196, whisper_loss=0.08985, over 21024.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01113, ecapa_loss=0.0001887, whisper_loss=0.0921, over 3860131.80 frames. ], batch size: 85, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:52:46,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1171880.0, ans=0.0 2024-08-11 15:52:56,975 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 15:53:07,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 3.089e+01 3.473e+01 5.447e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 15:53:18,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1172080.0, ans=0.0 2024-08-11 15:53:22,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1172080.0, ans=0.125 2024-08-11 15:53:25,273 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 15:53:51,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-08-11 15:54:02,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1300, loss[loss=0.09108, beats_loss=0.01037, ecapa_loss=0.0001864, whisper_loss=0.07885, over 21946.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01118, ecapa_loss=0.0001873, whisper_loss=0.09184, over 3838476.82 frames. ], batch size: 88, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:54:17,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1172480.0, ans=0.05 2024-08-11 15:54:38,358 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 15:54:41,208 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 34 from Vox, 24 fro AS 2024-08-11 15:54:47,343 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 15:55:18,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172780.0, ans=0.1 2024-08-11 15:55:22,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1350, loss[loss=0.1032, beats_loss=0.008349, ecapa_loss=0.0001693, whisper_loss=0.09316, over 14252.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0112, ecapa_loss=0.000187, whisper_loss=0.09197, over 3847159.92 frames. ], batch size: 54, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:55:30,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1172880.0, ans=0.07 2024-08-11 15:55:41,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1172980.0, ans=0.125 2024-08-11 15:55:43,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1172980.0, ans=0.2 2024-08-11 15:55:51,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.580e+01 3.028e+01 3.578e+01 5.392e+01, threshold=6.056e+01, percent-clipped=0.0 2024-08-11 15:56:01,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1173080.0, ans=0.125 2024-08-11 15:56:01,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1173080.0, ans=0.125 2024-08-11 15:56:04,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2024-08-11 15:56:08,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1173080.0, ans=0.125 2024-08-11 15:56:11,630 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.486e-02 2024-08-11 15:56:13,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1173080.0, ans=0.125 2024-08-11 15:56:14,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1173180.0, ans=0.125 2024-08-11 15:56:19,287 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-11 15:56:38,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1173280.0, ans=0.125 2024-08-11 15:56:43,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1173280.0, ans=0.125 2024-08-11 15:56:50,303 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1400, loss[loss=0.135, beats_loss=0.008188, ecapa_loss=0.0001998, whisper_loss=0.1249, over 23842.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01113, ecapa_loss=0.0001875, whisper_loss=0.09228, over 3851725.44 frames. ], batch size: 94, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:57:06,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1173480.0, ans=0.125 2024-08-11 15:57:06,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1173480.0, ans=0.07 2024-08-11 15:57:14,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.36 vs. limit=15.0 2024-08-11 15:57:15,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1173480.0, ans=0.0 2024-08-11 15:57:22,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-11 15:57:47,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1173680.0, ans=0.125 2024-08-11 15:57:48,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-11 15:57:51,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-08-11 15:57:52,036 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 15:58:06,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=22.5 2024-08-11 15:58:08,926 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 15:58:13,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1450, loss[loss=0.1087, beats_loss=0.008751, ecapa_loss=0.0002106, whisper_loss=0.09782, over 13710.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01116, ecapa_loss=0.0001865, whisper_loss=0.0917, over 3865062.36 frames. ], batch size: 53, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:59:09,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.580e+01 2.876e+01 3.331e+01 4.704e+01, threshold=5.752e+01, percent-clipped=0.0 2024-08-11 15:59:39,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1174180.0, ans=0.04949747468305833 2024-08-11 15:59:42,308 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 15:59:43,178 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.548e-03 2024-08-11 16:00:06,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1500, loss[loss=0.132, beats_loss=0.008797, ecapa_loss=0.0001928, whisper_loss=0.1213, over 21618.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.0001864, whisper_loss=0.09208, over 3871078.04 frames. ], batch size: 80, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:00:10,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-08-11 16:00:25,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-08-11 16:00:38,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1174580.0, ans=0.1 2024-08-11 16:00:48,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1174580.0, ans=0.125 2024-08-11 16:01:00,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1174680.0, ans=0.0 2024-08-11 16:01:04,294 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 16:01:05,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1174680.0, ans=0.125 2024-08-11 16:01:11,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1174780.0, ans=0.2 2024-08-11 16:01:22,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1174780.0, ans=0.025 2024-08-11 16:01:26,572 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1550, loss[loss=0.1026, beats_loss=0.01285, ecapa_loss=0.0001797, whisper_loss=0.08794, over 21362.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01116, ecapa_loss=0.0001836, whisper_loss=0.0921, over 3878146.06 frames. ], batch size: 85, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:01:41,123 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 16:01:52,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.587e+01 2.923e+01 3.490e+01 5.175e+01, threshold=5.845e+01, percent-clipped=0.0 2024-08-11 16:01:54,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1174980.0, ans=0.1 2024-08-11 16:02:04,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1175080.0, ans=0.2 2024-08-11 16:02:05,149 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 16:02:21,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1175180.0, ans=0.1 2024-08-11 16:02:38,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1175280.0, ans=0.0 2024-08-11 16:02:40,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1175280.0, ans=0.125 2024-08-11 16:02:43,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1600, loss[loss=0.09089, beats_loss=0.01379, ecapa_loss=0.0002054, whisper_loss=0.07504, over 21962.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01121, ecapa_loss=0.0001832, whisper_loss=0.09148, over 3894858.23 frames. ], batch size: 93, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:02:48,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:02:49,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:02:50,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:02:59,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1175480.0, ans=0.2 2024-08-11 16:03:23,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1175580.0, ans=0.125 2024-08-11 16:03:41,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1175680.0, ans=0.125 2024-08-11 16:04:00,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1650, loss[loss=0.1036, beats_loss=0.008831, ecapa_loss=0.0002418, whisper_loss=0.09239, over 19601.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01127, ecapa_loss=0.0001821, whisper_loss=0.09116, over 3876616.76 frames. ], batch size: 81, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:04:16,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1175980.0, ans=0.0 2024-08-11 16:04:25,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.491e+01 2.765e+01 3.253e+01 5.216e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-11 16:04:45,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.61 vs. limit=22.5 2024-08-11 16:04:58,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1176180.0, ans=0.0 2024-08-11 16:04:58,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1176180.0, ans=0.1 2024-08-11 16:05:00,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1176180.0, ans=0.125 2024-08-11 16:05:17,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1700, loss[loss=0.1016, beats_loss=0.01144, ecapa_loss=0.0002065, whisper_loss=0.0881, over 20131.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01119, ecapa_loss=0.0001825, whisper_loss=0.09168, over 3867476.31 frames. ], batch size: 83, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:05:26,639 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 16:05:35,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1176480.0, ans=0.125 2024-08-11 16:05:43,094 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 33 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 16:05:48,480 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 16:06:00,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1176680.0, ans=0.1 2024-08-11 16:06:08,471 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 16:06:15,271 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 16:06:25,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-08-11 16:06:30,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1750, loss[loss=0.1321, beats_loss=0.008297, ecapa_loss=0.0002117, whisper_loss=0.1217, over 20046.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01109, ecapa_loss=0.0001837, whisper_loss=0.09213, over 3867297.24 frames. ], batch size: 77, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:06:49,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=12.0 2024-08-11 16:06:53,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.634e+01 3.052e+01 3.436e+01 4.631e+01, threshold=6.105e+01, percent-clipped=0.0 2024-08-11 16:07:04,088 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 16:07:06,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-08-11 16:07:12,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1177180.0, ans=0.125 2024-08-11 16:07:13,901 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 16:07:21,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1177180.0, ans=0.125 2024-08-11 16:07:22,472 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 16:07:32,596 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 16:07:33,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1177280.0, ans=0.125 2024-08-11 16:07:38,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1177280.0, ans=0.125 2024-08-11 16:07:42,299 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1800, loss[loss=0.09833, beats_loss=0.01357, ecapa_loss=0.0002332, whisper_loss=0.08242, over 15732.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001855, whisper_loss=0.09244, over 3850957.32 frames. ], batch size: 67, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:07:56,673 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 16:07:58,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1177480.0, ans=0.125 2024-08-11 16:07:58,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1177480.0, ans=15.0 2024-08-11 16:08:23,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1177580.0, ans=0.0 2024-08-11 16:08:26,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-11 16:08:28,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1177680.0, ans=0.125 2024-08-11 16:08:46,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1177780.0, ans=0.125 2024-08-11 16:08:54,326 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1850, loss[loss=0.08629, beats_loss=0.01259, ecapa_loss=0.0001333, whisper_loss=0.07236, over 17841.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01111, ecapa_loss=0.0001846, whisper_loss=0.09178, over 3837309.81 frames. ], batch size: 68, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:09:05,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1177880.0, ans=0.1 2024-08-11 16:09:14,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1177980.0, ans=0.125 2024-08-11 16:09:18,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.637e+01 3.046e+01 3.560e+01 5.616e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 16:09:31,230 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 16:09:37,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1178180.0, ans=0.125 2024-08-11 16:09:41,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1178180.0, ans=0.1 2024-08-11 16:10:01,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.89 vs. limit=22.5 2024-08-11 16:10:03,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1178280.0, ans=10.0 2024-08-11 16:10:07,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1900, loss[loss=0.1098, beats_loss=0.01053, ecapa_loss=0.0001737, whisper_loss=0.0975, over 15449.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0112, ecapa_loss=0.0001865, whisper_loss=0.09193, over 3812229.43 frames. ], batch size: 57, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:10:12,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1178380.0, ans=0.0 2024-08-11 16:10:31,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1178480.0, ans=0.125 2024-08-11 16:10:44,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-11 16:10:49,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1178580.0, ans=0.0 2024-08-11 16:10:52,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1178680.0, ans=0.125 2024-08-11 16:11:07,800 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 16:11:08,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1178780.0, ans=0.0 2024-08-11 16:11:17,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1178780.0, ans=0.0 2024-08-11 16:11:22,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 1950, loss[loss=0.1019, beats_loss=0.01082, ecapa_loss=0.0002149, whisper_loss=0.08896, over 16921.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0112, ecapa_loss=0.000189, whisper_loss=0.09169, over 3815415.81 frames. ], batch size: 68, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:11:22,228 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:11:23,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1178880.0, ans=0.0 2024-08-11 16:11:41,051 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 16:11:41,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1178980.0, ans=0.0 2024-08-11 16:11:45,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.606e+01 2.950e+01 3.514e+01 8.174e+01, threshold=5.900e+01, percent-clipped=2.0 2024-08-11 16:11:50,878 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 16:11:57,586 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 16:12:01,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1179080.0, ans=0.0 2024-08-11 16:12:05,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1179180.0, ans=0.125 2024-08-11 16:12:09,271 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 16:12:20,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1179280.0, ans=0.125 2024-08-11 16:12:27,922 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 16:12:31,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1179280.0, ans=0.125 2024-08-11 16:12:36,784 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2000, loss[loss=0.07264, beats_loss=0.01534, ecapa_loss=0.0001714, whisper_loss=0.05558, over 17841.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01129, ecapa_loss=0.0001889, whisper_loss=0.09098, over 3822580.42 frames. ], batch size: 73, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:12:46,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1179380.0, ans=0.0 2024-08-11 16:12:54,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1179480.0, ans=0.125 2024-08-11 16:12:59,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1179480.0, ans=0.0 2024-08-11 16:13:13,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1179580.0, ans=0.5 2024-08-11 16:13:19,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1179580.0, ans=0.125 2024-08-11 16:13:23,583 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 16:13:25,048 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 16:13:33,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1179680.0, ans=0.125 2024-08-11 16:13:40,912 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-11 16:13:53,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2050, loss[loss=0.1252, beats_loss=0.01289, ecapa_loss=0.0001543, whisper_loss=0.1108, over 22612.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01128, ecapa_loss=0.0001895, whisper_loss=0.09142, over 3821788.15 frames. ], batch size: 88, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:13:58,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1179880.0, ans=0.1 2024-08-11 16:14:05,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1179880.0, ans=0.125 2024-08-11 16:14:06,844 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 16:14:16,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1179980.0, ans=22.5 2024-08-11 16:14:18,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.671e+01 2.965e+01 3.227e+01 2.393e+02, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 16:14:20,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1179980.0, ans=0.1 2024-08-11 16:14:23,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1179980.0, ans=0.95 2024-08-11 16:14:34,164 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 16:15:14,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2100, loss[loss=0.1182, beats_loss=0.01151, ecapa_loss=0.0001676, whisper_loss=0.105, over 22361.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01129, ecapa_loss=0.0001883, whisper_loss=0.09207, over 3826053.65 frames. ], batch size: 87, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:15:17,174 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 16:15:38,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1180480.0, ans=0.2 2024-08-11 16:15:38,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-11 16:15:41,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1180480.0, ans=0.125 2024-08-11 16:15:45,064 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 16:15:47,090 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 16:15:53,892 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 16:15:54,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2024-08-11 16:15:58,205 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 16:16:19,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-08-11 16:16:37,812 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2150, loss[loss=0.1211, beats_loss=0.01107, ecapa_loss=0.0001782, whisper_loss=0.1083, over 22497.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001893, whisper_loss=0.09269, over 3844604.82 frames. ], batch size: 87, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:16:59,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1180980.0, ans=0.2 2024-08-11 16:17:03,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.740e+01 2.984e+01 3.481e+01 5.761e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 16:17:13,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1181080.0, ans=0.125 2024-08-11 16:17:21,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1181080.0, ans=0.0 2024-08-11 16:17:32,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1181180.0, ans=0.125 2024-08-11 16:17:38,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-11 16:17:49,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1181280.0, ans=0.0 2024-08-11 16:18:02,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2200, loss[loss=0.0974, beats_loss=0.01064, ecapa_loss=0.0002058, whisper_loss=0.0847, over 20738.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01127, ecapa_loss=0.00019, whisper_loss=0.09262, over 3836869.76 frames. ], batch size: 84, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:18:32,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1181480.0, ans=0.1 2024-08-11 16:18:33,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1181580.0, ans=0.125 2024-08-11 16:18:41,517 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 16:18:43,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1181580.0, ans=0.125 2024-08-11 16:18:58,872 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 16:19:01,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1181680.0, ans=0.125 2024-08-11 16:19:05,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1181680.0, ans=0.125 2024-08-11 16:19:06,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2024-08-11 16:19:09,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1181780.0, ans=0.0 2024-08-11 16:19:10,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1181780.0, ans=0.04949747468305833 2024-08-11 16:19:10,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1181780.0, ans=0.125 2024-08-11 16:19:10,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1181780.0, ans=0.125 2024-08-11 16:19:12,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1181780.0, ans=0.1 2024-08-11 16:19:14,248 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 16:19:24,437 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2250, loss[loss=0.09756, beats_loss=0.01249, ecapa_loss=0.0002064, whisper_loss=0.08301, over 21709.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01129, ecapa_loss=0.0001911, whisper_loss=0.09218, over 3840775.41 frames. ], batch size: 92, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:19:33,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1181880.0, ans=0.0 2024-08-11 16:19:34,503 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 16:19:37,262 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 16:19:50,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.696e+01 3.022e+01 3.450e+01 8.988e+01, threshold=6.044e+01, percent-clipped=1.0 2024-08-11 16:19:51,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-11 16:20:19,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1182180.0, ans=0.0 2024-08-11 16:20:26,868 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 16:20:29,707 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 16:20:38,121 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 16:20:45,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2300, loss[loss=0.11, beats_loss=0.01011, ecapa_loss=0.0002297, whisper_loss=0.09757, over 21950.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01133, ecapa_loss=0.0001934, whisper_loss=0.09247, over 3853569.02 frames. ], batch size: 88, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:20:47,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=12.0 2024-08-11 16:20:48,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1182380.0, ans=0.125 2024-08-11 16:21:06,479 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 16:21:23,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-11 16:21:29,362 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-11 16:21:50,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-11 16:21:59,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1182780.0, ans=0.1 2024-08-11 16:22:05,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2350, loss[loss=0.1045, beats_loss=0.01016, ecapa_loss=0.0001998, whisper_loss=0.09236, over 14950.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.0001937, whisper_loss=0.09306, over 3876045.47 frames. ], batch size: 59, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:22:11,124 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 16:22:18,056 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 16:22:31,170 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 40 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 16:22:32,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-11 16:22:34,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.605e+01 2.959e+01 3.391e+01 6.517e+01, threshold=5.918e+01, percent-clipped=1.0 2024-08-11 16:22:48,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1183080.0, ans=0.0 2024-08-11 16:22:58,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1183180.0, ans=0.04949747468305833 2024-08-11 16:23:09,349 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 16:23:23,166 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 16:23:30,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2400, loss[loss=0.1124, beats_loss=0.008521, ecapa_loss=0.0002466, whisper_loss=0.1014, over 19422.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0112, ecapa_loss=0.0001949, whisper_loss=0.09325, over 3891833.87 frames. ], batch size: 79, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:23:35,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-11 16:23:36,364 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 16:23:38,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1183380.0, ans=0.125 2024-08-11 16:23:49,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1183480.0, ans=0.125 2024-08-11 16:24:23,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1183680.0, ans=0.125 2024-08-11 16:24:28,750 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 16:24:55,452 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2450, loss[loss=0.09258, beats_loss=0.01062, ecapa_loss=0.0001686, whisper_loss=0.08028, over 15252.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01118, ecapa_loss=0.0001936, whisper_loss=0.0935, over 3922518.79 frames. ], batch size: 55, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:24:57,469 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 16:25:19,548 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 16:25:20,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.638e+01 2.982e+01 3.407e+01 5.711e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 16:25:27,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1184080.0, ans=0.0 2024-08-11 16:25:33,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1184080.0, ans=0.1 2024-08-11 16:26:00,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1184180.0, ans=0.0 2024-08-11 16:26:09,640 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:26:15,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1184280.0, ans=22.5 2024-08-11 16:26:17,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1184380.0, ans=0.0 2024-08-11 16:26:18,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2500, loss[loss=0.1199, beats_loss=0.008833, ecapa_loss=0.0001652, whisper_loss=0.1094, over 19905.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01113, ecapa_loss=0.000194, whisper_loss=0.09371, over 3910361.59 frames. ], batch size: 72, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:26:22,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1184380.0, ans=0.0 2024-08-11 16:26:27,155 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 16:26:27,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1184380.0, ans=0.2 2024-08-11 16:26:36,348 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-11 16:26:39,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1184480.0, ans=0.125 2024-08-11 16:26:43,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1184480.0, ans=0.015 2024-08-11 16:26:48,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1184480.0, ans=0.125 2024-08-11 16:26:52,974 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 16:27:09,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1184680.0, ans=0.125 2024-08-11 16:27:34,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-11 16:27:45,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2550, loss[loss=0.1098, beats_loss=0.01001, ecapa_loss=0.0001628, whisper_loss=0.09819, over 18141.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01113, ecapa_loss=0.000194, whisper_loss=0.09409, over 3922901.71 frames. ], batch size: 69, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:28:04,447 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 16:28:11,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.549e+01 2.871e+01 3.222e+01 4.395e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 16:28:22,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-11 16:28:39,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185180.0, ans=0.1 2024-08-11 16:28:40,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=12.0 2024-08-11 16:29:00,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1185280.0, ans=0.125 2024-08-11 16:29:03,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1185280.0, ans=0.125 2024-08-11 16:29:06,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-08-11 16:29:10,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2600, loss[loss=0.1051, beats_loss=0.0114, ecapa_loss=0.0001493, whisper_loss=0.09218, over 13958.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01111, ecapa_loss=0.0001923, whisper_loss=0.09411, over 3889520.68 frames. ], batch size: 53, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:29:15,744 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 16:29:22,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-11 16:29:44,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1185580.0, ans=0.125 2024-08-11 16:30:07,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-11 16:30:22,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-11 16:30:32,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-08-11 16:30:33,410 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-11 16:30:34,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2650, loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0002525, whisper_loss=0.08927, over 20716.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01109, ecapa_loss=0.000193, whisper_loss=0.09467, over 3888578.06 frames. ], batch size: 87, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:30:34,943 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 16:30:37,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1185880.0, ans=0.125 2024-08-11 16:30:41,623 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 16:30:46,401 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-11 16:30:48,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1185880.0, ans=0.2 2024-08-11 16:31:01,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.668e+01 2.978e+01 3.517e+01 4.989e+01, threshold=5.956e+01, percent-clipped=0.0 2024-08-11 16:31:02,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-11 16:31:09,971 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 16:31:35,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1186180.0, ans=0.07 2024-08-11 16:31:53,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1186280.0, ans=0.125 2024-08-11 16:31:53,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-11 16:31:54,417 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 16:31:56,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1186280.0, ans=0.1 2024-08-11 16:31:56,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1186280.0, ans=0.1 2024-08-11 16:31:58,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1186380.0, ans=15.0 2024-08-11 16:31:59,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2700, loss[loss=0.09938, beats_loss=0.01074, ecapa_loss=0.0001957, whisper_loss=0.08668, over 22849.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01114, ecapa_loss=0.0001924, whisper_loss=0.09432, over 3925596.38 frames. ], batch size: 89, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:32:01,187 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 16:32:33,727 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 16:32:51,854 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 16:32:59,294 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 16:33:01,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1186680.0, ans=0.2 2024-08-11 16:33:08,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1186780.0, ans=0.2 2024-08-11 16:33:20,269 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2750, loss[loss=0.1076, beats_loss=0.01141, ecapa_loss=0.0001676, whisper_loss=0.09449, over 22763.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01124, ecapa_loss=0.0001902, whisper_loss=0.09351, over 3897267.74 frames. ], batch size: 92, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:33:29,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1186880.0, ans=0.2 2024-08-11 16:33:38,766 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 16:33:44,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1186980.0, ans=0.1 2024-08-11 16:33:47,018 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.790e+01 3.167e+01 3.660e+01 5.593e+01, threshold=6.335e+01, percent-clipped=0.0 2024-08-11 16:33:58,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1187080.0, ans=0.125 2024-08-11 16:34:03,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1187080.0, ans=0.0 2024-08-11 16:34:26,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1187280.0, ans=0.0 2024-08-11 16:34:28,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-11 16:34:30,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1187280.0, ans=0.125 2024-08-11 16:34:42,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2800, loss[loss=0.1004, beats_loss=0.01328, ecapa_loss=0.0001791, whisper_loss=0.08529, over 22134.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01124, ecapa_loss=0.0001909, whisper_loss=0.09389, over 3909387.54 frames. ], batch size: 89, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:34:50,613 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 16:34:51,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1187380.0, ans=0.0 2024-08-11 16:34:58,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1187480.0, ans=0.2 2024-08-11 16:35:04,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-08-11 16:35:08,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1187480.0, ans=0.125 2024-08-11 16:35:17,888 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 16:35:28,117 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 16:35:29,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1187580.0, ans=0.0 2024-08-11 16:35:32,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1187680.0, ans=0.125 2024-08-11 16:35:50,285 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 16:35:55,569 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 16:35:59,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.63 vs. limit=10.0 2024-08-11 16:36:03,613 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 16:36:04,563 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2850, loss[loss=0.1233, beats_loss=0.01028, ecapa_loss=0.0002096, whisper_loss=0.1109, over 23160.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01128, ecapa_loss=0.0001909, whisper_loss=0.09393, over 3897129.96 frames. ], batch size: 92, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:36:16,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1187880.0, ans=0.125 2024-08-11 16:36:31,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.618e+01 2.962e+01 3.443e+01 5.615e+01, threshold=5.924e+01, percent-clipped=0.0 2024-08-11 16:36:37,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=15.0 2024-08-11 16:36:39,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1188080.0, ans=0.0 2024-08-11 16:36:40,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.64 vs. limit=15.0 2024-08-11 16:36:51,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-11 16:36:59,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1188180.0, ans=0.125 2024-08-11 16:37:00,415 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 16:37:06,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1188180.0, ans=0.0 2024-08-11 16:37:28,198 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2900, loss[loss=0.09823, beats_loss=0.01134, ecapa_loss=0.0002009, whisper_loss=0.08488, over 21917.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01129, ecapa_loss=0.0001917, whisper_loss=0.09343, over 3870347.97 frames. ], batch size: 87, lr: 7.15e-03, grad_scale: 1.152921504606847e+18 2024-08-11 16:37:36,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1188380.0, ans=0.0 2024-08-11 16:37:36,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1188380.0, ans=0.1 2024-08-11 16:37:41,336 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 16:37:48,567 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 15 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 16:37:51,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-11 16:37:56,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-11 16:37:57,656 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 16:38:16,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-08-11 16:38:17,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-11 16:38:29,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1188780.0, ans=0.0 2024-08-11 16:38:34,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1188780.0, ans=0.2 2024-08-11 16:38:38,736 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 16:38:40,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1188780.0, ans=0.0 2024-08-11 16:38:43,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 2950, loss[loss=0.09888, beats_loss=0.01219, ecapa_loss=0.0001946, whisper_loss=0.08474, over 23565.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01133, ecapa_loss=0.0001921, whisper_loss=0.09327, over 3903438.90 frames. ], batch size: 95, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:38:49,756 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 16:38:51,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1188880.0, ans=0.125 2024-08-11 16:38:58,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1188980.0, ans=15.0 2024-08-11 16:39:06,568 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.714e+01 3.075e+01 3.561e+01 5.736e+01, threshold=6.149e+01, percent-clipped=0.0 2024-08-11 16:39:14,819 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 16:39:15,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-08-11 16:39:23,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1189180.0, ans=0.0 2024-08-11 16:39:39,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1189280.0, ans=0.125 2024-08-11 16:39:49,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1189280.0, ans=0.1 2024-08-11 16:39:50,167 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 16:39:51,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3000, loss[loss=0.1083, beats_loss=0.009863, ecapa_loss=0.0002464, whisper_loss=0.09596, over 14174.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01134, ecapa_loss=0.0001925, whisper_loss=0.09305, over 3884135.52 frames. ], batch size: 58, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:39:51,283 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 16:40:32,761 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on ASR_libri: loss=0.2566, beats_loss=0, ecapa_loss=0.0006312, whisper_loss=0.2502, over 922467.00 frames. 2024-08-11 16:40:50,119 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on SV_voxceleb1: loss=0.005299, beats_loss=0, ecapa_loss=0.0005299, whisper_loss=0, over 939242.00 frames. 2024-08-11 16:41:53,241 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8466, 3.4941, 3.1587, 3.3678], device='cuda:3') 2024-08-11 16:42:48,147 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on AT_audioset: loss=0.02498, beats_loss=0.02498, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 16:42:48,151 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 16:42:50,647 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 39 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:42:57,132 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 16:43:04,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1189480.0, ans=0.125 2024-08-11 16:43:07,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1189480.0, ans=0.2 2024-08-11 16:43:22,857 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 16:43:43,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1189780.0, ans=0.025 2024-08-11 16:43:46,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-11 16:43:50,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1189780.0, ans=0.09899494936611666 2024-08-11 16:43:54,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3050, loss[loss=0.09861, beats_loss=0.01274, ecapa_loss=0.0001841, whisper_loss=0.08403, over 19597.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01145, ecapa_loss=0.0001915, whisper_loss=0.09225, over 3896005.80 frames. ], batch size: 80, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:43:56,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1189880.0, ans=0.07 2024-08-11 16:44:08,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1189980.0, ans=0.125 2024-08-11 16:44:16,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.646e+01 3.011e+01 3.406e+01 6.810e+01, threshold=6.022e+01, percent-clipped=0.0 2024-08-11 16:44:16,989 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 16:44:19,951 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:44:40,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1190180.0, ans=0.07 2024-08-11 16:44:40,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1190180.0, ans=0.125 2024-08-11 16:44:46,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1190180.0, ans=0.125 2024-08-11 16:45:01,338 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3100, loss[loss=0.1249, beats_loss=0.008909, ecapa_loss=0.0002222, whisper_loss=0.1137, over 17266.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01142, ecapa_loss=0.0001921, whisper_loss=0.09315, over 3896055.37 frames. ], batch size: 67, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:45:05,687 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 16:45:12,751 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-11 16:45:30,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=15.0 2024-08-11 16:45:32,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1190580.0, ans=0.125 2024-08-11 16:45:35,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1190580.0, ans=0.125 2024-08-11 16:45:39,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1190580.0, ans=0.0 2024-08-11 16:45:43,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2024-08-11 16:46:09,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3150, loss[loss=0.1064, beats_loss=0.009982, ecapa_loss=0.0002052, whisper_loss=0.09437, over 17139.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.000194, whisper_loss=0.09361, over 3910500.19 frames. ], batch size: 65, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:46:11,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-11 16:46:13,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1190880.0, ans=0.2 2024-08-11 16:46:24,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1190980.0, ans=0.125 2024-08-11 16:46:24,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1190980.0, ans=0.0 2024-08-11 16:46:25,227 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 16:46:31,643 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.615e+01 2.879e+01 3.586e+01 1.580e+02, threshold=5.758e+01, percent-clipped=2.0 2024-08-11 16:46:31,875 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 16:46:46,473 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-11 16:46:50,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1191180.0, ans=0.2 2024-08-11 16:46:53,577 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-11 16:46:54,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1191180.0, ans=0.125 2024-08-11 16:47:04,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1191280.0, ans=0.0 2024-08-11 16:47:15,765 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3200, loss[loss=0.1363, beats_loss=0.009542, ecapa_loss=0.000202, whisper_loss=0.1247, over 23468.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01132, ecapa_loss=0.0001944, whisper_loss=0.09385, over 3909191.95 frames. ], batch size: 90, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:47:23,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-11 16:47:25,126 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 16:47:32,104 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 16:47:33,399 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 16:47:34,744 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 16:47:37,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1191480.0, ans=0.125 2024-08-11 16:47:55,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1191680.0, ans=0.07 2024-08-11 16:48:05,837 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 16:48:09,803 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 16:48:22,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3250, loss[loss=0.09358, beats_loss=0.01207, ecapa_loss=0.0002137, whisper_loss=0.07937, over 18644.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01132, ecapa_loss=0.0001951, whisper_loss=0.09371, over 3891568.61 frames. ], batch size: 76, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:48:31,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1191880.0, ans=0.025 2024-08-11 16:48:36,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1191980.0, ans=0.125 2024-08-11 16:48:42,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1191980.0, ans=0.125 2024-08-11 16:48:42,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1191980.0, ans=0.1 2024-08-11 16:48:45,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.517e+01 2.867e+01 3.292e+01 6.213e+01, threshold=5.733e+01, percent-clipped=1.0 2024-08-11 16:48:46,895 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 16:49:03,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1192180.0, ans=0.125 2024-08-11 16:49:07,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1192180.0, ans=0.2 2024-08-11 16:49:24,647 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 16:49:26,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1192280.0, ans=0.0 2024-08-11 16:49:29,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3300, loss[loss=0.09568, beats_loss=0.01196, ecapa_loss=0.0001598, whisper_loss=0.08213, over 20456.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01124, ecapa_loss=0.0001949, whisper_loss=0.09335, over 3904482.26 frames. ], batch size: 78, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:49:34,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1192380.0, ans=0.0 2024-08-11 16:49:43,258 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 16:49:51,623 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 16:49:55,945 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 16:49:59,880 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 16:50:00,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-11 16:50:25,221 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 16:50:25,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1192780.0, ans=0.125 2024-08-11 16:50:28,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2024-08-11 16:50:30,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1192780.0, ans=10.0 2024-08-11 16:50:31,614 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 16:50:37,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3350, loss[loss=0.117, beats_loss=0.01045, ecapa_loss=0.0001633, whisper_loss=0.1049, over 23777.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01119, ecapa_loss=0.0001945, whisper_loss=0.09357, over 3898678.18 frames. ], batch size: 91, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:50:46,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1192880.0, ans=0.5 2024-08-11 16:50:52,614 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 16:50:58,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1192980.0, ans=0.0 2024-08-11 16:50:59,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.599e+01 2.933e+01 3.463e+01 7.726e+01, threshold=5.866e+01, percent-clipped=2.0 2024-08-11 16:50:59,503 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 16:51:00,863 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 16:51:22,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1193180.0, ans=0.125 2024-08-11 16:51:23,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1193180.0, ans=0.125 2024-08-11 16:51:23,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-08-11 16:51:33,900 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 16:51:35,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1193280.0, ans=0.0 2024-08-11 16:51:36,403 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 16:51:42,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3400, loss[loss=0.09537, beats_loss=0.01374, ecapa_loss=0.0002014, whisper_loss=0.07961, over 17928.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01126, ecapa_loss=0.000193, whisper_loss=0.09302, over 3892839.83 frames. ], batch size: 79, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:51:45,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1193380.0, ans=0.0 2024-08-11 16:51:50,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1193380.0, ans=0.1 2024-08-11 16:51:53,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1193380.0, ans=0.125 2024-08-11 16:51:54,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2024-08-11 16:52:10,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1193580.0, ans=0.07 2024-08-11 16:52:15,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1193580.0, ans=0.125 2024-08-11 16:52:17,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1193580.0, ans=0.2 2024-08-11 16:52:37,459 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 16:52:37,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1193780.0, ans=0.0 2024-08-11 16:52:39,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-11 16:52:48,935 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3450, loss[loss=0.0955, beats_loss=0.01186, ecapa_loss=0.0001784, whisper_loss=0.08186, over 16744.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01126, ecapa_loss=0.0001941, whisper_loss=0.0926, over 3877065.59 frames. ], batch size: 64, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:52:55,676 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 16:52:58,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1193880.0, ans=0.125 2024-08-11 16:53:11,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.578e+01 2.987e+01 3.563e+01 4.797e+01, threshold=5.975e+01, percent-clipped=0.0 2024-08-11 16:53:39,180 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 16:53:54,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3500, loss[loss=0.08554, beats_loss=0.01138, ecapa_loss=0.0001775, whisper_loss=0.07238, over 18331.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01121, ecapa_loss=0.000195, whisper_loss=0.09274, over 3869767.15 frames. ], batch size: 73, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:53:54,801 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 16:53:57,474 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 16:53:57,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1194380.0, ans=0.1 2024-08-11 16:54:02,603 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 16:54:04,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1194380.0, ans=0.125 2024-08-11 16:54:13,955 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 18 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-11 16:54:16,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1194480.0, ans=0.125 2024-08-11 16:54:34,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1194680.0, ans=0.1 2024-08-11 16:54:36,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1194680.0, ans=0.125 2024-08-11 16:54:36,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1194680.0, ans=0.125 2024-08-11 16:55:00,071 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3550, loss[loss=0.09803, beats_loss=0.0135, ecapa_loss=0.0001471, whisper_loss=0.08305, over 23096.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.0001935, whisper_loss=0.09257, over 3849879.21 frames. ], batch size: 90, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:55:00,200 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 16:55:02,853 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 16:55:05,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1194880.0, ans=0.2 2024-08-11 16:55:05,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1194880.0, ans=0.125 2024-08-11 16:55:10,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.32 vs. limit=10.0 2024-08-11 16:55:22,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.768e+01 2.986e+01 3.532e+01 5.359e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-11 16:55:38,668 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.202e-01 2024-08-11 16:55:46,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1195180.0, ans=0.0 2024-08-11 16:55:48,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1195180.0, ans=0.125 2024-08-11 16:55:53,441 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 16:56:05,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1195280.0, ans=0.09899494936611666 2024-08-11 16:56:07,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3600, loss[loss=0.08946, beats_loss=0.01376, ecapa_loss=0.0001862, whisper_loss=0.07384, over 16346.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0001933, whisper_loss=0.09306, over 3852601.87 frames. ], batch size: 66, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:56:11,757 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.308e-01 2024-08-11 16:56:21,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1195480.0, ans=0.125 2024-08-11 16:56:27,798 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 16:56:29,207 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 16:56:36,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1195580.0, ans=0.0 2024-08-11 16:56:40,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1195580.0, ans=0.125 2024-08-11 16:56:52,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1195680.0, ans=0.2 2024-08-11 16:56:52,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1195680.0, ans=22.5 2024-08-11 16:56:56,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1195680.0, ans=0.125 2024-08-11 16:56:57,064 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 16:57:04,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1195780.0, ans=0.125 2024-08-11 16:57:13,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3650, loss[loss=0.117, beats_loss=0.01172, ecapa_loss=0.0001759, whisper_loss=0.1035, over 24164.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01119, ecapa_loss=0.0001939, whisper_loss=0.09272, over 3854451.45 frames. ], batch size: 91, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:57:13,984 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-11 16:57:22,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1195880.0, ans=0.2 2024-08-11 16:57:30,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-11 16:57:36,258 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.649e+01 3.037e+01 3.697e+01 5.413e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 16:58:00,479 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-11 16:58:15,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1196280.0, ans=0.0 2024-08-11 16:58:20,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1196380.0, ans=0.1 2024-08-11 16:58:21,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3700, loss[loss=0.09822, beats_loss=0.01088, ecapa_loss=0.0002196, whisper_loss=0.08514, over 19149.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01115, ecapa_loss=0.0001952, whisper_loss=0.09303, over 3826079.54 frames. ], batch size: 77, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:58:29,728 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 16:58:39,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1196480.0, ans=0.2 2024-08-11 16:59:15,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1196780.0, ans=0.125 2024-08-11 16:59:22,102 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-11 16:59:27,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3750, loss[loss=0.1, beats_loss=0.01221, ecapa_loss=0.0002165, whisper_loss=0.08565, over 21641.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01117, ecapa_loss=0.0001947, whisper_loss=0.09317, over 3833188.46 frames. ], batch size: 90, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:59:27,836 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 16:59:29,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1196880.0, ans=0.125 2024-08-11 16:59:29,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1196880.0, ans=0.09899494936611666 2024-08-11 16:59:45,277 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 16:59:50,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.626e+01 2.806e+01 3.237e+01 4.971e+01, threshold=5.612e+01, percent-clipped=0.0 2024-08-11 16:59:56,163 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 17:00:00,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1197080.0, ans=0.125 2024-08-11 17:00:06,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1197180.0, ans=0.0 2024-08-11 17:00:12,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1197180.0, ans=0.0 2024-08-11 17:00:25,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1197280.0, ans=0.125 2024-08-11 17:00:34,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3800, loss[loss=0.1161, beats_loss=0.01144, ecapa_loss=0.000178, whisper_loss=0.1029, over 21914.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01115, ecapa_loss=0.0001955, whisper_loss=0.09376, over 3820108.15 frames. ], batch size: 89, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:00:40,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-11 17:00:45,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1197380.0, ans=0.1 2024-08-11 17:00:50,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1197480.0, ans=0.1 2024-08-11 17:01:09,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-08-11 17:01:19,422 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 17:01:23,568 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 17:01:32,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.06 vs. limit=10.0 2024-08-11 17:01:38,363 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 17:01:40,848 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3850, loss[loss=0.103, beats_loss=0.01137, ecapa_loss=0.0002104, whisper_loss=0.08954, over 17485.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01121, ecapa_loss=0.0001957, whisper_loss=0.09405, over 3840150.66 frames. ], batch size: 71, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:01:43,710 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-11 17:01:48,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1197880.0, ans=0.125 2024-08-11 17:01:53,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-08-11 17:02:03,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.720e+01 3.010e+01 3.419e+01 7.200e+01, threshold=6.020e+01, percent-clipped=2.0 2024-08-11 17:02:14,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1198080.0, ans=0.1 2024-08-11 17:02:16,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1198080.0, ans=0.125 2024-08-11 17:02:18,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1198080.0, ans=0.1 2024-08-11 17:02:28,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1198180.0, ans=0.1 2024-08-11 17:02:33,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1198280.0, ans=0.0 2024-08-11 17:02:44,803 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 17:02:47,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3900, loss[loss=0.09747, beats_loss=0.01169, ecapa_loss=0.0002083, whisper_loss=0.0837, over 14559.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01127, ecapa_loss=0.0001961, whisper_loss=0.09386, over 3858709.61 frames. ], batch size: 56, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:03:12,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1198480.0, ans=0.125 2024-08-11 17:03:19,446 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 17:03:19,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1198580.0, ans=0.125 2024-08-11 17:03:43,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1198780.0, ans=0.0 2024-08-11 17:03:53,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 3950, loss[loss=0.09967, beats_loss=0.01219, ecapa_loss=0.0001871, whisper_loss=0.08561, over 21787.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01116, ecapa_loss=0.0001961, whisper_loss=0.09464, over 3876579.53 frames. ], batch size: 88, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:03:56,393 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 17:04:09,572 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 17:04:11,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1198980.0, ans=0.125 2024-08-11 17:04:15,581 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.737e+01 3.009e+01 3.546e+01 1.155e+02, threshold=6.019e+01, percent-clipped=1.0 2024-08-11 17:04:17,043 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 17:04:24,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1199080.0, ans=0.1 2024-08-11 17:04:34,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1199180.0, ans=0.1 2024-08-11 17:04:40,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1199180.0, ans=0.125 2024-08-11 17:04:48,567 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 17:04:49,844 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 17:04:58,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-08-11 17:05:00,925 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4000, loss[loss=0.1101, beats_loss=0.0106, ecapa_loss=0.0002324, whisper_loss=0.09715, over 16762.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01111, ecapa_loss=0.0001971, whisper_loss=0.09442, over 3869801.79 frames. ], batch size: 67, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:05:02,436 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 17:05:04,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1199380.0, ans=0.2 2024-08-11 17:05:16,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=28.12 vs. limit=22.5 2024-08-11 17:05:37,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1199580.0, ans=15.0 2024-08-11 17:05:42,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1199680.0, ans=10.0 2024-08-11 17:05:57,420 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 17:05:59,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-11 17:06:11,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4050, loss[loss=0.09854, beats_loss=0.009945, ecapa_loss=0.0001497, whisper_loss=0.0871, over 15422.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01109, ecapa_loss=0.0001971, whisper_loss=0.09427, over 3853141.43 frames. ], batch size: 58, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:06:11,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1199880.0, ans=0.125 2024-08-11 17:06:14,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1199880.0, ans=0.125 2024-08-11 17:06:30,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-08-11 17:06:30,658 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 17:06:37,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.884e+01 3.098e+01 3.625e+01 5.878e+01, threshold=6.196e+01, percent-clipped=0.0 2024-08-11 17:06:48,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1200080.0, ans=0.1 2024-08-11 17:07:01,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1200180.0, ans=0.125 2024-08-11 17:07:03,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2024-08-11 17:07:05,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-08-11 17:07:08,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-11 17:07:15,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-11 17:07:23,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4100, loss[loss=0.1176, beats_loss=0.009701, ecapa_loss=0.0001498, whisper_loss=0.1064, over 16866.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01117, ecapa_loss=0.0001955, whisper_loss=0.0938, over 3851089.60 frames. ], batch size: 64, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:07:23,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1200380.0, ans=0.1 2024-08-11 17:08:01,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1200580.0, ans=0.2 2024-08-11 17:08:04,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1200680.0, ans=0.0 2024-08-11 17:08:32,517 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4150, loss[loss=0.112, beats_loss=0.01037, ecapa_loss=0.0001651, whisper_loss=0.09997, over 19418.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01118, ecapa_loss=0.0001941, whisper_loss=0.09356, over 3851551.57 frames. ], batch size: 75, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:08:38,592 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-11 17:08:42,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1200880.0, ans=0.125 2024-08-11 17:08:44,909 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 17:08:55,217 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.681e+01 3.148e+01 3.707e+01 5.413e+01, threshold=6.297e+01, percent-clipped=0.0 2024-08-11 17:09:28,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1201280.0, ans=0.0 2024-08-11 17:09:29,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1201280.0, ans=0.125 2024-08-11 17:09:30,632 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 17:09:33,289 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 17:09:42,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4200, loss[loss=0.09902, beats_loss=0.01201, ecapa_loss=0.0002272, whisper_loss=0.08474, over 21163.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0112, ecapa_loss=0.0001953, whisper_loss=0.09381, over 3860386.97 frames. ], batch size: 89, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:09:55,740 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 17:09:58,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1201480.0, ans=0.0 2024-08-11 17:09:59,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2024-08-11 17:09:59,892 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 17:10:30,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1201680.0, ans=0.125 2024-08-11 17:10:31,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-11 17:10:33,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1201680.0, ans=0.2 2024-08-11 17:10:51,627 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 17:10:52,711 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4250, loss[loss=0.1203, beats_loss=0.01023, ecapa_loss=0.0001794, whisper_loss=0.1083, over 23085.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01132, ecapa_loss=0.0001938, whisper_loss=0.09264, over 3821434.05 frames. ], batch size: 88, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:11:09,214 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 17:11:16,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.599e+01 2.986e+01 3.415e+01 8.403e+01, threshold=5.972e+01, percent-clipped=2.0 2024-08-11 17:11:27,603 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 17:12:01,428 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4300, loss[loss=0.08936, beats_loss=0.01144, ecapa_loss=0.0001888, whisper_loss=0.07603, over 22570.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01133, ecapa_loss=0.0001924, whisper_loss=0.09187, over 3812653.09 frames. ], batch size: 91, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:12:31,595 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 17:12:34,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1202580.0, ans=0.125 2024-08-11 17:12:39,932 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-11 17:12:43,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=12.0 2024-08-11 17:12:49,123 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 17:12:49,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1202680.0, ans=0.125 2024-08-11 17:12:57,629 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 17:13:07,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.05 vs. limit=6.0 2024-08-11 17:13:11,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4350, loss[loss=0.1061, beats_loss=0.01111, ecapa_loss=0.0001827, whisper_loss=0.09313, over 22300.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01131, ecapa_loss=0.0001931, whisper_loss=0.0919, over 3831552.75 frames. ], batch size: 88, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:13:11,700 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 17:13:13,323 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 17:13:31,757 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 17:13:32,900 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 17:13:35,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.556e+01 3.068e+01 3.501e+01 5.955e+01, threshold=6.137e+01, percent-clipped=0.0 2024-08-11 17:13:41,292 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 17:13:47,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1203080.0, ans=0.025 2024-08-11 17:13:55,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2024-08-11 17:14:21,398 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4400, loss[loss=0.1226, beats_loss=0.009866, ecapa_loss=0.000181, whisper_loss=0.1109, over 21328.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001939, whisper_loss=0.09266, over 3822397.03 frames. ], batch size: 83, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:14:30,039 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 17:14:35,525 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-11 17:14:42,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1203480.0, ans=0.0 2024-08-11 17:14:45,968 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 17:14:47,181 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 17:14:59,374 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 17:15:24,167 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 17:15:25,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1203780.0, ans=0.125 2024-08-11 17:15:34,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4450, loss[loss=0.1254, beats_loss=0.009128, ecapa_loss=0.0001906, whisper_loss=0.1143, over 20562.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.0001938, whisper_loss=0.09258, over 3823673.50 frames. ], batch size: 77, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:15:39,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1203880.0, ans=0.0 2024-08-11 17:15:41,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1203880.0, ans=0.2 2024-08-11 17:15:42,806 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 17:15:47,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1203880.0, ans=0.1 2024-08-11 17:15:55,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1203980.0, ans=0.2 2024-08-11 17:16:02,174 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.738e+01 3.141e+01 3.648e+01 6.257e+01, threshold=6.281e+01, percent-clipped=1.0 2024-08-11 17:16:09,570 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 17:16:22,392 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.127e-03 2024-08-11 17:16:27,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2024-08-11 17:16:32,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1204180.0, ans=0.1 2024-08-11 17:16:40,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1204280.0, ans=0.125 2024-08-11 17:16:40,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1204280.0, ans=0.125 2024-08-11 17:16:54,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4500, loss[loss=0.1111, beats_loss=0.01251, ecapa_loss=0.0001738, whisper_loss=0.09683, over 23577.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01129, ecapa_loss=0.000194, whisper_loss=0.09154, over 3834545.29 frames. ], batch size: 94, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:16:56,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1204380.0, ans=0.125 2024-08-11 17:17:15,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-08-11 17:17:58,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1204680.0, ans=0.125 2024-08-11 17:18:04,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1204780.0, ans=0.0 2024-08-11 17:18:13,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1204780.0, ans=0.125 2024-08-11 17:18:17,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4550, loss[loss=0.1063, beats_loss=0.01155, ecapa_loss=0.0001896, whisper_loss=0.09289, over 22172.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01135, ecapa_loss=0.0001953, whisper_loss=0.09124, over 3876139.25 frames. ], batch size: 91, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:18:20,971 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 17:18:28,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1204880.0, ans=0.125 2024-08-11 17:18:40,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1204980.0, ans=0.125 2024-08-11 17:18:44,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.743e+01 3.155e+01 3.839e+01 5.758e+01, threshold=6.310e+01, percent-clipped=0.0 2024-08-11 17:19:05,993 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 17:19:21,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1205280.0, ans=0.125 2024-08-11 17:19:21,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1205280.0, ans=0.2 2024-08-11 17:19:21,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1205280.0, ans=0.05 2024-08-11 17:19:25,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1205280.0, ans=0.2 2024-08-11 17:19:34,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4600, loss[loss=0.116, beats_loss=0.01127, ecapa_loss=0.0002038, whisper_loss=0.1027, over 14595.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01135, ecapa_loss=0.0001945, whisper_loss=0.09117, over 3869938.12 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:19:42,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1205380.0, ans=0.125 2024-08-11 17:19:45,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1205380.0, ans=0.125 2024-08-11 17:19:53,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1205480.0, ans=0.125 2024-08-11 17:19:56,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1205480.0, ans=0.0 2024-08-11 17:20:22,935 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 17:20:37,339 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 17:20:48,215 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 26 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-11 17:20:54,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4650, loss[loss=0.11, beats_loss=0.01148, ecapa_loss=0.0002439, whisper_loss=0.09603, over 19376.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01129, ecapa_loss=0.0001954, whisper_loss=0.09239, over 3860033.81 frames. ], batch size: 79, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:21:19,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2024-08-11 17:21:23,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2024-08-11 17:21:23,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.651e+01 2.897e+01 3.330e+01 4.454e+01, threshold=5.794e+01, percent-clipped=0.0 2024-08-11 17:21:32,333 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 17:21:47,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1206180.0, ans=10.0 2024-08-11 17:21:51,377 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-11 17:21:52,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1206180.0, ans=0.1 2024-08-11 17:22:12,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4700, loss[loss=0.07728, beats_loss=0.01396, ecapa_loss=0.0001582, whisper_loss=0.06174, over 16500.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.0001957, whisper_loss=0.09255, over 3853998.38 frames. ], batch size: 70, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:22:16,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1206380.0, ans=0.125 2024-08-11 17:22:23,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1206380.0, ans=0.125 2024-08-11 17:22:41,183 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 10 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 17:22:53,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1206680.0, ans=0.125 2024-08-11 17:22:56,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1206680.0, ans=0.125 2024-08-11 17:22:57,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-08-11 17:23:19,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4750, loss[loss=0.09822, beats_loss=0.009349, ecapa_loss=0.0002415, whisper_loss=0.08646, over 21067.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01126, ecapa_loss=0.0001946, whisper_loss=0.09226, over 3857595.57 frames. ], batch size: 88, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:23:19,469 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 17:23:20,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-08-11 17:23:26,452 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 17:23:29,181 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 17:23:33,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-08-11 17:23:36,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1206980.0, ans=0.0 2024-08-11 17:23:40,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1206980.0, ans=15.0 2024-08-11 17:23:42,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.773e+01 3.300e+01 3.701e+01 2.356e+02, threshold=6.600e+01, percent-clipped=1.0 2024-08-11 17:24:26,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4800, loss[loss=0.09284, beats_loss=0.01149, ecapa_loss=0.000256, whisper_loss=0.07879, over 21658.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01135, ecapa_loss=0.0001953, whisper_loss=0.09157, over 3864580.17 frames. ], batch size: 94, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:24:38,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1207480.0, ans=0.1 2024-08-11 17:24:40,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1207480.0, ans=0.125 2024-08-11 17:24:41,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1207480.0, ans=0.125 2024-08-11 17:24:42,296 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-11 17:25:01,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2024-08-11 17:25:14,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1207680.0, ans=0.07 2024-08-11 17:25:25,013 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 17:25:32,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4850, loss[loss=0.1244, beats_loss=0.01055, ecapa_loss=0.0001839, whisper_loss=0.112, over 17909.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01136, ecapa_loss=0.0001947, whisper_loss=0.09211, over 3899157.66 frames. ], batch size: 66, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:25:34,442 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 17:25:43,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-08-11 17:25:53,080 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 17:25:55,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.586e+01 2.829e+01 3.279e+01 4.850e+01, threshold=5.658e+01, percent-clipped=0.0 2024-08-11 17:26:07,989 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 17:26:11,960 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 17:26:21,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1208180.0, ans=0.1 2024-08-11 17:26:22,491 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 17:26:25,202 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 17:26:27,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2024-08-11 17:26:30,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1208280.0, ans=0.125 2024-08-11 17:26:31,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1208280.0, ans=0.125 2024-08-11 17:26:32,835 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-11 17:26:38,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1208380.0, ans=0.0 2024-08-11 17:26:39,303 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4900, loss[loss=0.09402, beats_loss=0.0101, ecapa_loss=0.0001624, whisper_loss=0.08229, over 18129.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01137, ecapa_loss=0.0001932, whisper_loss=0.09233, over 3901211.97 frames. ], batch size: 71, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:26:47,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2024-08-11 17:26:58,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-11 17:27:04,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1208480.0, ans=0.0 2024-08-11 17:27:09,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1208580.0, ans=0.125 2024-08-11 17:27:18,443 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 17:27:47,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1208780.0, ans=0.0 2024-08-11 17:27:50,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 4950, loss[loss=0.1434, beats_loss=0.008223, ecapa_loss=0.0002108, whisper_loss=0.1331, over 16192.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01136, ecapa_loss=0.0001939, whisper_loss=0.09218, over 3867927.41 frames. ], batch size: 62, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:27:57,677 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 17:27:59,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1208880.0, ans=10.0 2024-08-11 17:28:02,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1208880.0, ans=0.2 2024-08-11 17:28:03,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1208980.0, ans=0.0 2024-08-11 17:28:08,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1208980.0, ans=0.125 2024-08-11 17:28:08,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1208980.0, ans=0.0 2024-08-11 17:28:10,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-11 17:28:14,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1208980.0, ans=0.125 2024-08-11 17:28:15,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.567e+01 2.832e+01 3.214e+01 4.886e+01, threshold=5.664e+01, percent-clipped=0.0 2024-08-11 17:28:27,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1209080.0, ans=0.0 2024-08-11 17:28:30,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1209080.0, ans=0.0 2024-08-11 17:28:34,723 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 12 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 17:28:37,489 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 17:28:43,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1209180.0, ans=0.125 2024-08-11 17:28:58,082 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 17:29:03,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2024-08-11 17:29:04,872 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5000, loss[loss=0.09202, beats_loss=0.0125, ecapa_loss=0.0001778, whisper_loss=0.07775, over 21961.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01131, ecapa_loss=0.0001918, whisper_loss=0.0932, over 3876047.84 frames. ], batch size: 88, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:29:11,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-11 17:29:14,817 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 17:29:34,380 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 17:29:45,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1209580.0, ans=0.125 2024-08-11 17:30:03,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1209780.0, ans=0.2 2024-08-11 17:30:09,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1209780.0, ans=0.2 2024-08-11 17:30:19,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5050, loss[loss=0.09024, beats_loss=0.01397, ecapa_loss=0.000166, whisper_loss=0.07462, over 21619.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001933, whisper_loss=0.09272, over 3891139.78 frames. ], batch size: 86, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:30:19,440 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 17:30:20,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1209880.0, ans=0.2 2024-08-11 17:30:27,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1209880.0, ans=0.0 2024-08-11 17:30:41,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2024-08-11 17:30:43,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1209980.0, ans=0.2 2024-08-11 17:30:44,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.569e+01 2.847e+01 3.482e+01 7.100e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 17:30:56,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1210080.0, ans=0.125 2024-08-11 17:31:00,474 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 17:31:07,922 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 17:31:10,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1210180.0, ans=0.125 2024-08-11 17:31:13,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1210180.0, ans=0.125 2024-08-11 17:31:35,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5100, loss[loss=0.1008, beats_loss=0.01034, ecapa_loss=0.00024, whisper_loss=0.08807, over 15563.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01138, ecapa_loss=0.0001928, whisper_loss=0.09317, over 3909443.93 frames. ], batch size: 64, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:31:42,655 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 17:32:03,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1210480.0, ans=0.2 2024-08-11 17:32:07,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2024-08-11 17:32:21,459 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 17:32:32,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1210680.0, ans=0.0 2024-08-11 17:32:36,353 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 17:32:41,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1210780.0, ans=0.125 2024-08-11 17:32:55,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5150, loss[loss=0.1014, beats_loss=0.0117, ecapa_loss=0.0002046, whisper_loss=0.08762, over 17742.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01135, ecapa_loss=0.0001936, whisper_loss=0.09307, over 3906094.08 frames. ], batch size: 71, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:33:22,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.614e+01 3.081e+01 3.730e+01 5.554e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 17:34:04,551 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 17:34:05,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1211280.0, ans=0.025 2024-08-11 17:34:11,841 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5200, loss[loss=0.1039, beats_loss=0.01199, ecapa_loss=0.0001967, whisper_loss=0.08994, over 22618.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01126, ecapa_loss=0.0001926, whisper_loss=0.09391, over 3921906.72 frames. ], batch size: 91, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:34:16,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1211380.0, ans=0.125 2024-08-11 17:34:19,342 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 17:34:22,593 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 17:34:25,142 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 17:34:40,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1211480.0, ans=0.125 2024-08-11 17:35:25,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1211780.0, ans=0.125 2024-08-11 17:35:27,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1211780.0, ans=0.09899494936611666 2024-08-11 17:35:29,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5250, loss[loss=0.1098, beats_loss=0.0123, ecapa_loss=0.000183, whisper_loss=0.09563, over 22243.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01123, ecapa_loss=0.0001927, whisper_loss=0.09367, over 3905503.98 frames. ], batch size: 90, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:35:46,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1211980.0, ans=0.2 2024-08-11 17:35:57,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.672e+01 3.061e+01 3.448e+01 6.321e+01, threshold=6.122e+01, percent-clipped=2.0 2024-08-11 17:36:08,875 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 17:36:33,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1212280.0, ans=0.0 2024-08-11 17:36:48,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5300, loss[loss=0.1091, beats_loss=0.01191, ecapa_loss=0.0001811, whisper_loss=0.09538, over 21769.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01121, ecapa_loss=0.0001939, whisper_loss=0.09333, over 3893925.04 frames. ], batch size: 86, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:36:52,691 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 17:36:55,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1212380.0, ans=0.1 2024-08-11 17:37:09,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1212480.0, ans=0.1 2024-08-11 17:37:19,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1212580.0, ans=0.0 2024-08-11 17:37:29,188 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 17:37:35,043 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 17:38:04,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5350, loss[loss=0.1172, beats_loss=0.01088, ecapa_loss=0.0001788, whisper_loss=0.1046, over 19684.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0113, ecapa_loss=0.0001933, whisper_loss=0.09288, over 3886650.07 frames. ], batch size: 74, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:38:08,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=12.0 2024-08-11 17:38:23,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1212980.0, ans=0.125 2024-08-11 17:38:29,922 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.524e+01 2.904e+01 3.271e+01 6.276e+01, threshold=5.808e+01, percent-clipped=1.0 2024-08-11 17:38:30,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1212980.0, ans=0.125 2024-08-11 17:38:36,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1213080.0, ans=0.125 2024-08-11 17:38:40,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-11 17:39:12,025 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 17:39:15,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1213280.0, ans=0.0 2024-08-11 17:39:23,904 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 17:39:25,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5400, loss[loss=0.09277, beats_loss=0.01209, ecapa_loss=0.0002044, whisper_loss=0.07864, over 20751.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01131, ecapa_loss=0.0001927, whisper_loss=0.09209, over 3902578.92 frames. ], batch size: 86, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:39:29,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-08-11 17:39:37,781 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 17:40:26,019 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 17:40:35,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1213780.0, ans=0.95 2024-08-11 17:40:38,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1213780.0, ans=0.0 2024-08-11 17:40:44,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5450, loss[loss=0.102, beats_loss=0.009712, ecapa_loss=0.0002241, whisper_loss=0.09004, over 20925.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0113, ecapa_loss=0.0001908, whisper_loss=0.09252, over 3851723.07 frames. ], batch size: 83, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:40:57,592 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 17:41:01,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2024-08-11 17:41:09,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1213980.0, ans=0.125 2024-08-11 17:41:11,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.638e+01 2.966e+01 3.379e+01 5.199e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 17:41:15,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1214080.0, ans=0.1 2024-08-11 17:41:24,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.03 vs. limit=10.0 2024-08-11 17:41:35,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1214180.0, ans=0.04949747468305833 2024-08-11 17:41:41,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1214180.0, ans=0.05 2024-08-11 17:42:03,419 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5500, loss[loss=0.1023, beats_loss=0.01244, ecapa_loss=0.0002296, whisper_loss=0.08752, over 21662.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01133, ecapa_loss=0.0001936, whisper_loss=0.09282, over 3890338.21 frames. ], batch size: 89, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:42:10,787 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 17:42:13,700 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 17:42:37,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1214580.0, ans=0.0 2024-08-11 17:42:44,335 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 17:42:54,415 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 17:42:59,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1214680.0, ans=0.125 2024-08-11 17:43:05,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1214680.0, ans=0.0 2024-08-11 17:43:09,719 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 17:43:10,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1214780.0, ans=0.125 2024-08-11 17:43:15,561 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 17:43:25,405 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5550, loss[loss=0.1116, beats_loss=0.01325, ecapa_loss=0.0001718, whisper_loss=0.09667, over 22658.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01128, ecapa_loss=0.0001947, whisper_loss=0.09304, over 3899660.07 frames. ], batch size: 91, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:43:52,120 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 17:43:52,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1214980.0, ans=0.125 2024-08-11 17:43:53,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.662e+01 2.905e+01 3.480e+01 6.680e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-11 17:43:56,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1214980.0, ans=0.0 2024-08-11 17:43:59,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1215080.0, ans=0.05 2024-08-11 17:44:09,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2024-08-11 17:44:46,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5600, loss[loss=0.0991, beats_loss=0.01202, ecapa_loss=0.0001904, whisper_loss=0.08518, over 17991.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01127, ecapa_loss=0.0001929, whisper_loss=0.09328, over 3896062.47 frames. ], batch size: 70, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:44:46,687 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 22 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-11 17:44:48,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1215380.0, ans=0.125 2024-08-11 17:45:00,615 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 17:45:17,180 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 17:45:39,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1215680.0, ans=0.125 2024-08-11 17:45:43,013 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 17:45:51,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1215780.0, ans=0.125 2024-08-11 17:46:03,061 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 17:46:05,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5650, loss[loss=0.1126, beats_loss=0.01035, ecapa_loss=0.0002267, whisper_loss=0.09999, over 17466.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0113, ecapa_loss=0.0001925, whisper_loss=0.0931, over 3925705.48 frames. ], batch size: 72, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:46:19,519 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 17:46:31,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.596e+01 3.008e+01 3.518e+01 5.757e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 17:46:34,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1215980.0, ans=0.1 2024-08-11 17:46:36,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1216080.0, ans=0.0 2024-08-11 17:46:54,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1216180.0, ans=0.125 2024-08-11 17:47:02,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1216180.0, ans=0.1 2024-08-11 17:47:17,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1216280.0, ans=0.0 2024-08-11 17:47:22,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5700, loss[loss=0.1197, beats_loss=0.01023, ecapa_loss=0.0001606, whisper_loss=0.1079, over 19559.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01133, ecapa_loss=0.0001911, whisper_loss=0.0934, over 3924501.76 frames. ], batch size: 74, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:47:25,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1216380.0, ans=0.125 2024-08-11 17:47:34,209 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 17:48:21,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1216680.0, ans=0.125 2024-08-11 17:48:22,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1216680.0, ans=0.125 2024-08-11 17:48:42,714 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5750, loss[loss=0.1006, beats_loss=0.01154, ecapa_loss=0.0002006, whisper_loss=0.08707, over 18738.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01134, ecapa_loss=0.0001917, whisper_loss=0.09339, over 3911630.80 frames. ], batch size: 77, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:48:46,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-11 17:49:04,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.93 vs. limit=10.0 2024-08-11 17:49:05,190 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:49:07,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=22.5 2024-08-11 17:49:08,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.576e+01 2.968e+01 3.290e+01 6.597e+01, threshold=5.936e+01, percent-clipped=1.0 2024-08-11 17:49:15,294 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 17:49:33,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1217180.0, ans=0.125 2024-08-11 17:49:37,943 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-11 17:49:57,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-08-11 17:50:00,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5800, loss[loss=0.09858, beats_loss=0.01325, ecapa_loss=0.0001351, whisper_loss=0.08397, over 19697.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01129, ecapa_loss=0.0001908, whisper_loss=0.09356, over 3910323.19 frames. ], batch size: 79, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:50:20,048 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 17:50:29,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1217580.0, ans=0.125 2024-08-11 17:50:46,719 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 17:50:58,713 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 22 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-11 17:51:15,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5850, loss[loss=0.09536, beats_loss=0.01223, ecapa_loss=0.0001873, whisper_loss=0.08127, over 22803.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.0001924, whisper_loss=0.09304, over 3915209.44 frames. ], batch size: 91, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:51:30,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1217980.0, ans=0.125 2024-08-11 17:51:39,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.535e+01 2.906e+01 3.221e+01 4.693e+01, threshold=5.811e+01, percent-clipped=0.0 2024-08-11 17:51:40,230 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 17:51:40,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1217980.0, ans=0.1 2024-08-11 17:52:13,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1218280.0, ans=0.125 2024-08-11 17:52:29,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5900, loss[loss=0.1027, beats_loss=0.01181, ecapa_loss=0.0001962, whisper_loss=0.0889, over 22363.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01128, ecapa_loss=0.0001919, whisper_loss=0.09333, over 3933369.68 frames. ], batch size: 92, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:52:35,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1218380.0, ans=0.125 2024-08-11 17:53:28,657 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 17:53:43,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1218780.0, ans=0.0 2024-08-11 17:53:47,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1218880.0, ans=0.125 2024-08-11 17:53:47,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 5950, loss[loss=0.08578, beats_loss=0.01499, ecapa_loss=0.0001722, whisper_loss=0.06907, over 22681.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01128, ecapa_loss=0.0001914, whisper_loss=0.0932, over 3934474.57 frames. ], batch size: 93, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:53:52,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-11 17:54:13,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.543e+01 2.844e+01 3.292e+01 4.976e+01, threshold=5.688e+01, percent-clipped=0.0 2024-08-11 17:54:37,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-08-11 17:54:47,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1219280.0, ans=0.0 2024-08-11 17:54:52,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1219280.0, ans=0.125 2024-08-11 17:55:03,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6000, loss[loss=0.1139, beats_loss=0.01118, ecapa_loss=0.0002501, whisper_loss=0.1002, over 19347.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01128, ecapa_loss=0.0001928, whisper_loss=0.09314, over 3913266.32 frames. ], batch size: 80, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:55:03,135 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 17:55:39,322 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006361, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 17:55:57,612 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on SV_voxceleb1: loss=0.005086, beats_loss=0, ecapa_loss=0.0005086, whisper_loss=0, over 939242.00 frames. 2024-08-11 17:57:42,088 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on AT_audioset: loss=0.02513, beats_loss=0.02513, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 17:57:42,091 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 17:58:23,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1219580.0, ans=0.125 2024-08-11 17:58:55,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1219780.0, ans=0.125 2024-08-11 17:58:58,417 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 17:59:06,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1219880.0, ans=0.1 2024-08-11 17:59:06,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6050, loss[loss=0.1121, beats_loss=0.01306, ecapa_loss=0.0001593, whisper_loss=0.09747, over 22562.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01124, ecapa_loss=0.0001919, whisper_loss=0.09376, over 3894652.88 frames. ], batch size: 88, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:59:07,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1219880.0, ans=0.125 2024-08-11 17:59:10,681 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 17:59:22,130 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 17:59:22,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1219980.0, ans=0.0 2024-08-11 17:59:34,509 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.877e+01 3.382e+01 4.916e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-11 17:59:44,628 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 18:00:29,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6100, loss[loss=0.1207, beats_loss=0.01027, ecapa_loss=0.0001888, whisper_loss=0.1086, over 16533.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01123, ecapa_loss=0.0001918, whisper_loss=0.09365, over 3899639.13 frames. ], batch size: 67, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:00:54,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-11 18:00:58,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1220480.0, ans=0.05 2024-08-11 18:01:17,002 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 18:01:30,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1220680.0, ans=0.0 2024-08-11 18:01:36,286 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-11 18:01:37,820 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 18:01:47,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2024-08-11 18:01:52,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1220880.0, ans=0.125 2024-08-11 18:01:52,812 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6150, loss[loss=0.09925, beats_loss=0.009682, ecapa_loss=0.0002342, whisper_loss=0.08722, over 18885.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01114, ecapa_loss=0.0001936, whisper_loss=0.09383, over 3919546.53 frames. ], batch size: 76, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:01:54,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1220880.0, ans=0.1 2024-08-11 18:02:06,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1220880.0, ans=0.125 2024-08-11 18:02:20,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.665e+01 2.922e+01 3.415e+01 6.689e+01, threshold=5.844e+01, percent-clipped=1.0 2024-08-11 18:02:32,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2024-08-11 18:02:41,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1221180.0, ans=0.0 2024-08-11 18:03:11,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6200, loss[loss=0.08886, beats_loss=0.01437, ecapa_loss=0.0001531, whisper_loss=0.07295, over 22122.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01116, ecapa_loss=0.000193, whisper_loss=0.09305, over 3883365.55 frames. ], batch size: 88, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:03:18,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-11 18:03:59,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1221680.0, ans=0.05 2024-08-11 18:04:00,938 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 18:04:10,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1221680.0, ans=0.125 2024-08-11 18:04:12,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2024-08-11 18:04:24,050 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 18:04:25,576 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 18:04:31,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6250, loss[loss=0.09688, beats_loss=0.0128, ecapa_loss=0.0001845, whisper_loss=0.08223, over 21783.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01125, ecapa_loss=0.0001932, whisper_loss=0.09222, over 3915342.82 frames. ], batch size: 89, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:04:33,036 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 18:04:37,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1221880.0, ans=0.2 2024-08-11 18:04:42,709 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:04:58,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.590e+01 2.864e+01 3.315e+01 6.460e+01, threshold=5.728e+01, percent-clipped=1.0 2024-08-11 18:05:04,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1222080.0, ans=0.1 2024-08-11 18:05:17,379 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-11 18:05:23,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1222180.0, ans=0.125 2024-08-11 18:05:52,355 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6300, loss[loss=0.1074, beats_loss=0.009177, ecapa_loss=0.0002229, whisper_loss=0.09601, over 22379.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01123, ecapa_loss=0.0001936, whisper_loss=0.09262, over 3911043.41 frames. ], batch size: 93, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:06:21,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1222380.0, ans=0.0 2024-08-11 18:06:36,012 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 18:06:57,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1222580.0, ans=0.125 2024-08-11 18:07:23,545 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 18:07:35,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1222780.0, ans=0.0 2024-08-11 18:07:46,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6350, loss[loss=0.1214, beats_loss=0.008845, ecapa_loss=0.0002211, whisper_loss=0.1103, over 16781.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01119, ecapa_loss=0.0001943, whisper_loss=0.0923, over 3876516.43 frames. ], batch size: 65, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:07:47,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-11 18:07:54,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2024-08-11 18:08:14,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1222980.0, ans=0.0 2024-08-11 18:08:17,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-08-11 18:08:17,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.576e+01 2.972e+01 3.431e+01 4.977e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 18:08:30,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1223080.0, ans=0.125 2024-08-11 18:08:56,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1223180.0, ans=0.125 2024-08-11 18:09:12,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1223280.0, ans=0.125 2024-08-11 18:09:22,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1223280.0, ans=15.0 2024-08-11 18:09:32,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6400, loss[loss=0.1196, beats_loss=0.01013, ecapa_loss=0.0001986, whisper_loss=0.1074, over 17146.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01125, ecapa_loss=0.000192, whisper_loss=0.0931, over 3907838.91 frames. ], batch size: 64, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:09:42,035 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 18:09:42,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-11 18:09:53,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1223480.0, ans=0.1 2024-08-11 18:09:53,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1223480.0, ans=0.125 2024-08-11 18:10:04,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1223480.0, ans=0.0 2024-08-11 18:10:21,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1223580.0, ans=0.125 2024-08-11 18:10:34,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-11 18:10:35,962 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 18:11:05,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1223780.0, ans=0.2 2024-08-11 18:11:23,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6450, loss[loss=0.09321, beats_loss=0.01295, ecapa_loss=0.0002026, whisper_loss=0.07823, over 22031.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.0001919, whisper_loss=0.09273, over 3909991.07 frames. ], batch size: 94, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:11:27,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1223880.0, ans=0.07 2024-08-11 18:11:30,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1223880.0, ans=0.2 2024-08-11 18:11:39,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-08-11 18:11:42,636 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 18:11:49,923 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 18:12:01,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1223980.0, ans=0.1 2024-08-11 18:12:08,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.662e+01 3.047e+01 3.508e+01 5.395e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 18:12:57,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-11 18:13:26,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2024-08-11 18:13:26,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6500, loss[loss=0.115, beats_loss=0.01093, ecapa_loss=0.0002065, whisper_loss=0.102, over 21663.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01132, ecapa_loss=0.0001917, whisper_loss=0.09307, over 3925475.69 frames. ], batch size: 88, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:13:29,978 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 15 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 18:13:49,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-11 18:13:53,411 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 18:14:13,256 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 18:14:14,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1224580.0, ans=0.125 2024-08-11 18:14:32,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-11 18:14:49,631 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-11 18:14:57,657 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.094e-02 2024-08-11 18:15:24,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6550, loss[loss=0.08977, beats_loss=0.01032, ecapa_loss=0.0002379, whisper_loss=0.07708, over 20907.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01133, ecapa_loss=0.0001934, whisper_loss=0.0926, over 3917395.36 frames. ], batch size: 89, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:15:36,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1224880.0, ans=0.1 2024-08-11 18:15:46,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1224880.0, ans=0.125 2024-08-11 18:16:06,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.812e+01 3.232e+01 4.010e+01 5.660e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-11 18:16:32,963 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 18:16:47,104 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 18:16:57,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1225280.0, ans=0.0 2024-08-11 18:17:00,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1225280.0, ans=0.1 2024-08-11 18:17:05,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6600, loss[loss=0.09905, beats_loss=0.01254, ecapa_loss=0.0001445, whisper_loss=0.08507, over 20565.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01127, ecapa_loss=0.0001939, whisper_loss=0.09324, over 3929460.75 frames. ], batch size: 81, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:17:07,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-08-11 18:17:17,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1225380.0, ans=0.0 2024-08-11 18:17:20,122 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 24 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-11 18:17:52,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-08-11 18:18:33,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-11 18:18:33,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6650, loss[loss=0.06409, beats_loss=0.01307, ecapa_loss=0.0002357, whisper_loss=0.04866, over 14765.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01135, ecapa_loss=0.0001934, whisper_loss=0.09302, over 3911429.79 frames. ], batch size: 64, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:18:44,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=12.0 2024-08-11 18:18:48,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1225880.0, ans=0.125 2024-08-11 18:19:02,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-11 18:19:02,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.730e+01 3.226e+01 3.856e+01 7.096e+01, threshold=6.452e+01, percent-clipped=1.0 2024-08-11 18:19:13,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-11 18:19:18,229 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 18:19:35,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-08-11 18:19:48,237 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 18:19:53,827 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 33 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 18:20:00,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6700, loss[loss=0.1218, beats_loss=0.00611, ecapa_loss=0.0002327, whisper_loss=0.1133, over 23945.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01129, ecapa_loss=0.000194, whisper_loss=0.09358, over 3915181.38 frames. ], batch size: 92, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:20:10,243 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 18:20:29,211 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 18:21:01,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1226680.0, ans=0.125 2024-08-11 18:21:01,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-11 18:21:20,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1226780.0, ans=0.0 2024-08-11 18:21:25,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6750, loss[loss=0.1186, beats_loss=0.01044, ecapa_loss=0.0001859, whisper_loss=0.1063, over 17695.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01123, ecapa_loss=0.0001947, whisper_loss=0.09296, over 3856349.01 frames. ], batch size: 71, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:21:56,007 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 18:21:57,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.725e+01 3.041e+01 3.593e+01 5.305e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 18:22:03,999 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 18:22:04,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1227080.0, ans=0.2 2024-08-11 18:22:25,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1227180.0, ans=0.125 2024-08-11 18:22:30,643 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:22:34,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1227280.0, ans=0.1 2024-08-11 18:22:36,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1227280.0, ans=0.125 2024-08-11 18:22:41,001 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 18:22:45,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1227280.0, ans=0.0 2024-08-11 18:22:51,081 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-11 18:22:52,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6800, loss[loss=0.09917, beats_loss=0.01387, ecapa_loss=0.000188, whisper_loss=0.08342, over 23361.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001951, whisper_loss=0.09302, over 3848385.48 frames. ], batch size: 95, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:23:00,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1227380.0, ans=0.0 2024-08-11 18:23:15,444 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 18:23:15,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1227480.0, ans=0.0 2024-08-11 18:23:16,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1227480.0, ans=0.125 2024-08-11 18:23:24,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.59 vs. limit=22.5 2024-08-11 18:23:26,979 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 18:23:32,673 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 18:23:40,913 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-11 18:23:43,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1227680.0, ans=0.1 2024-08-11 18:23:45,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2024-08-11 18:23:45,874 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-11 18:23:58,129 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-11 18:24:02,149 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 18:24:19,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-11 18:24:20,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6850, loss[loss=0.091, beats_loss=0.01367, ecapa_loss=0.0001684, whisper_loss=0.07565, over 15782.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01121, ecapa_loss=0.0001957, whisper_loss=0.09286, over 3826212.05 frames. ], batch size: 63, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:24:41,053 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 18:24:45,042 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 18:24:49,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1227980.0, ans=0.1 2024-08-11 18:24:52,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.557e+01 2.801e+01 3.138e+01 4.430e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-11 18:24:58,481 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 18:25:07,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1228080.0, ans=0.125 2024-08-11 18:25:08,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1228080.0, ans=0.125 2024-08-11 18:25:17,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1228180.0, ans=0.125 2024-08-11 18:25:19,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1228180.0, ans=0.2 2024-08-11 18:25:52,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6900, loss[loss=0.1141, beats_loss=0.01141, ecapa_loss=0.0002306, whisper_loss=0.1004, over 21347.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0113, ecapa_loss=0.0001959, whisper_loss=0.09272, over 3861528.66 frames. ], batch size: 89, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:25:58,079 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 18:26:01,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1228380.0, ans=0.125 2024-08-11 18:26:10,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1228480.0, ans=0.1 2024-08-11 18:26:18,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2024-08-11 18:26:20,933 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 18:26:43,013 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 18:27:06,173 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 18:27:07,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1228780.0, ans=0.125 2024-08-11 18:27:23,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 6950, loss[loss=0.1054, beats_loss=0.01215, ecapa_loss=0.0002236, whisper_loss=0.091, over 16701.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01128, ecapa_loss=0.0001942, whisper_loss=0.09295, over 3890541.50 frames. ], batch size: 71, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:27:33,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1228880.0, ans=0.125 2024-08-11 18:27:51,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1228980.0, ans=0.04949747468305833 2024-08-11 18:27:53,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1228980.0, ans=0.0 2024-08-11 18:27:55,235 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 18:27:55,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1228980.0, ans=0.125 2024-08-11 18:27:56,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.618e+01 3.004e+01 3.400e+01 5.942e+01, threshold=6.008e+01, percent-clipped=1.0 2024-08-11 18:28:00,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1229080.0, ans=0.035 2024-08-11 18:28:11,898 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 18:28:18,907 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 18:28:20,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1229180.0, ans=0.0 2024-08-11 18:28:34,189 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 18:28:54,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7000, loss[loss=0.1141, beats_loss=0.01202, ecapa_loss=0.0002249, whisper_loss=0.09979, over 21892.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0113, ecapa_loss=0.0001948, whisper_loss=0.09301, over 3900442.99 frames. ], batch size: 91, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:28:56,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1229380.0, ans=0.125 2024-08-11 18:28:58,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1229380.0, ans=0.2 2024-08-11 18:29:05,074 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 24 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-11 18:29:10,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1229480.0, ans=0.0 2024-08-11 18:29:12,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.10 vs. limit=10.0 2024-08-11 18:29:13,600 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 18:29:16,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1229480.0, ans=0.0 2024-08-11 18:29:19,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1229480.0, ans=0.125 2024-08-11 18:29:42,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1229580.0, ans=0.125 2024-08-11 18:29:44,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1229580.0, ans=0.125 2024-08-11 18:30:15,706 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 18:30:23,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7050, loss[loss=0.08044, beats_loss=0.01571, ecapa_loss=0.0001772, whisper_loss=0.06296, over 19514.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01137, ecapa_loss=0.0001931, whisper_loss=0.09297, over 3921880.97 frames. ], batch size: 83, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:30:27,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1229880.0, ans=0.125 2024-08-11 18:30:37,371 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 34 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 18:30:52,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1229980.0, ans=0.0 2024-08-11 18:30:54,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.713e+01 3.050e+01 3.555e+01 6.661e+01, threshold=6.100e+01, percent-clipped=2.0 2024-08-11 18:30:56,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1229980.0, ans=0.125 2024-08-11 18:31:04,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1230080.0, ans=0.125 2024-08-11 18:31:05,970 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 18:31:26,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1230180.0, ans=0.125 2024-08-11 18:31:28,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1230180.0, ans=0.125 2024-08-11 18:31:34,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1230280.0, ans=0.0 2024-08-11 18:31:37,854 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 18:31:37,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1230280.0, ans=0.015 2024-08-11 18:31:52,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=15.0 2024-08-11 18:31:52,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7100, loss[loss=0.08957, beats_loss=0.01229, ecapa_loss=0.0001849, whisper_loss=0.07543, over 13509.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01135, ecapa_loss=0.0001916, whisper_loss=0.09279, over 3896351.67 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:32:03,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1230380.0, ans=0.1 2024-08-11 18:32:16,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1230480.0, ans=0.125 2024-08-11 18:32:51,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1230680.0, ans=0.125 2024-08-11 18:32:54,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1230680.0, ans=0.2 2024-08-11 18:33:10,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1230780.0, ans=0.2 2024-08-11 18:33:19,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1230780.0, ans=0.2 2024-08-11 18:33:21,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7150, loss[loss=0.1178, beats_loss=0.01087, ecapa_loss=0.0001751, whisper_loss=0.1051, over 24037.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01125, ecapa_loss=0.0001915, whisper_loss=0.09341, over 3899563.84 frames. ], batch size: 95, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:33:22,613 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 18:33:54,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1230980.0, ans=0.125 2024-08-11 18:33:54,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.688e+01 3.029e+01 3.368e+01 5.006e+01, threshold=6.058e+01, percent-clipped=0.0 2024-08-11 18:34:05,222 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-11 18:34:14,873 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 7 from Vox, 36 fro AS 2024-08-11 18:34:16,617 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 18:34:51,841 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 18:34:53,929 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-11 18:34:55,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7200, loss[loss=0.09338, beats_loss=0.01403, ecapa_loss=0.0001434, whisper_loss=0.07791, over 22749.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0001895, whisper_loss=0.09326, over 3915828.70 frames. ], batch size: 89, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:34:55,480 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 18:34:58,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1231380.0, ans=0.125 2024-08-11 18:35:22,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-11 18:35:28,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1231580.0, ans=0.125 2024-08-11 18:35:48,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1231680.0, ans=0.5 2024-08-11 18:35:51,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2024-08-11 18:35:55,734 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-11 18:36:19,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1231780.0, ans=0.1 2024-08-11 18:36:19,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2024-08-11 18:36:21,818 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7250, loss[loss=0.1394, beats_loss=0.007008, ecapa_loss=0.0002011, whisper_loss=0.1303, over 20117.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01131, ecapa_loss=0.0001901, whisper_loss=0.09339, over 3911515.48 frames. ], batch size: 77, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:36:22,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1231880.0, ans=0.125 2024-08-11 18:36:24,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1231880.0, ans=0.125 2024-08-11 18:36:29,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-08-11 18:36:33,818 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-11 18:36:37,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1231980.0, ans=0.1 2024-08-11 18:36:41,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1231980.0, ans=0.0 2024-08-11 18:36:51,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.618e+01 2.954e+01 3.399e+01 5.489e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 18:37:04,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1232080.0, ans=0.0 2024-08-11 18:37:34,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2024-08-11 18:37:37,609 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 18:37:45,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7300, loss[loss=0.1155, beats_loss=0.01064, ecapa_loss=0.0001882, whisper_loss=0.103, over 17741.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01127, ecapa_loss=0.0001902, whisper_loss=0.09337, over 3898737.70 frames. ], batch size: 69, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:37:54,902 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 18:38:11,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1232480.0, ans=0.2 2024-08-11 18:38:18,273 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 18:38:35,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1232680.0, ans=0.125 2024-08-11 18:38:38,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1232680.0, ans=0.0 2024-08-11 18:38:43,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2024-08-11 18:38:54,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1232780.0, ans=0.1 2024-08-11 18:39:05,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2024-08-11 18:39:09,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7350, loss[loss=0.1099, beats_loss=0.01101, ecapa_loss=0.0002299, whisper_loss=0.09656, over 21047.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01135, ecapa_loss=0.00019, whisper_loss=0.09268, over 3886555.74 frames. ], batch size: 90, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:39:31,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1232980.0, ans=0.125 2024-08-11 18:39:38,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-08-11 18:39:39,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.552e+01 3.033e+01 3.374e+01 5.510e+01, threshold=6.067e+01, percent-clipped=0.0 2024-08-11 18:39:54,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=10.0 2024-08-11 18:40:13,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-08-11 18:40:23,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1233280.0, ans=0.0 2024-08-11 18:40:32,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7400, loss[loss=0.1049, beats_loss=0.01051, ecapa_loss=0.0002095, whisper_loss=0.09233, over 23111.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01135, ecapa_loss=0.0001903, whisper_loss=0.09244, over 3874142.05 frames. ], batch size: 94, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:40:43,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1233380.0, ans=0.0 2024-08-11 18:40:44,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1233380.0, ans=0.125 2024-08-11 18:41:30,019 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 18:41:41,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1233780.0, ans=0.1 2024-08-11 18:41:44,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1233780.0, ans=0.125 2024-08-11 18:41:55,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7450, loss[loss=0.1072, beats_loss=0.008202, ecapa_loss=0.0002627, whisper_loss=0.09634, over 20497.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001907, whisper_loss=0.09275, over 3866129.33 frames. ], batch size: 88, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:42:28,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.705e+01 3.012e+01 3.463e+01 6.106e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-11 18:43:00,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1234180.0, ans=0.2 2024-08-11 18:43:22,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7500, loss[loss=0.1068, beats_loss=0.01107, ecapa_loss=0.0002906, whisper_loss=0.09281, over 19004.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01133, ecapa_loss=0.0001916, whisper_loss=0.09313, over 3873228.57 frames. ], batch size: 84, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:43:26,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1234380.0, ans=0.125 2024-08-11 18:43:26,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1234380.0, ans=0.125 2024-08-11 18:43:32,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1234380.0, ans=0.2 2024-08-11 18:43:52,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1234480.0, ans=0.0 2024-08-11 18:44:10,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1234680.0, ans=0.1 2024-08-11 18:44:11,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1234680.0, ans=0.09899494936611666 2024-08-11 18:44:25,193 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 18:44:40,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1234780.0, ans=0.125 2024-08-11 18:44:44,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7550, loss[loss=0.1054, beats_loss=0.01004, ecapa_loss=0.000217, whisper_loss=0.09322, over 22300.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01135, ecapa_loss=0.0001914, whisper_loss=0.0928, over 3832088.59 frames. ], batch size: 92, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:44:53,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=12.0 2024-08-11 18:44:54,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1234880.0, ans=0.125 2024-08-11 18:44:55,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1234880.0, ans=0.0 2024-08-11 18:45:01,371 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.791e-01 2024-08-11 18:45:05,214 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 18:45:10,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1234980.0, ans=0.2 2024-08-11 18:45:12,808 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.591e+01 2.941e+01 3.490e+01 1.489e+02, threshold=5.883e+01, percent-clipped=2.0 2024-08-11 18:45:33,936 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 18:45:39,687 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 18:45:50,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1235280.0, ans=0.0 2024-08-11 18:46:07,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7600, loss[loss=0.08358, beats_loss=0.0141, ecapa_loss=0.0001804, whisper_loss=0.06768, over 18562.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01128, ecapa_loss=0.0001915, whisper_loss=0.09303, over 3831282.18 frames. ], batch size: 78, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:46:08,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1235380.0, ans=0.125 2024-08-11 18:46:43,355 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 18:47:12,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1235680.0, ans=0.1 2024-08-11 18:47:18,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1235780.0, ans=0.0 2024-08-11 18:47:25,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1235780.0, ans=0.1 2024-08-11 18:47:29,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1235780.0, ans=0.2 2024-08-11 18:47:31,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1235780.0, ans=0.125 2024-08-11 18:47:34,268 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7650, loss[loss=0.09347, beats_loss=0.01039, ecapa_loss=0.0002281, whisper_loss=0.0808, over 15733.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001921, whisper_loss=0.09308, over 3842884.25 frames. ], batch size: 65, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:47:41,454 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 18:47:42,844 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 18:47:44,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2024-08-11 18:47:47,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1235880.0, ans=0.1 2024-08-11 18:47:48,360 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 18:47:56,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1235980.0, ans=0.1 2024-08-11 18:48:04,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.623e+01 3.033e+01 3.717e+01 6.248e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 18:48:22,178 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 18:48:23,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-11 18:48:29,238 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 18:48:38,425 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-11 18:48:51,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1236280.0, ans=0.1 2024-08-11 18:48:55,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1236280.0, ans=0.0 2024-08-11 18:49:00,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7700, loss[loss=0.09175, beats_loss=0.01249, ecapa_loss=0.0002071, whisper_loss=0.07719, over 21499.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01121, ecapa_loss=0.0001928, whisper_loss=0.09298, over 3856186.45 frames. ], batch size: 92, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:49:12,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1236380.0, ans=0.1 2024-08-11 18:49:15,411 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 18:49:52,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1236680.0, ans=0.125 2024-08-11 18:50:06,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1236780.0, ans=0.09899494936611666 2024-08-11 18:50:22,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7750, loss[loss=0.105, beats_loss=0.01189, ecapa_loss=0.0002008, whisper_loss=0.09112, over 22661.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01115, ecapa_loss=0.0001937, whisper_loss=0.09273, over 3861259.70 frames. ], batch size: 90, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:50:41,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-11 18:50:51,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1236980.0, ans=0.1 2024-08-11 18:50:52,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.693e+01 2.903e+01 3.373e+01 1.168e+02, threshold=5.806e+01, percent-clipped=1.0 2024-08-11 18:50:53,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1236980.0, ans=0.125 2024-08-11 18:50:57,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2024-08-11 18:51:03,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1237080.0, ans=0.125 2024-08-11 18:51:06,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1237080.0, ans=0.0 2024-08-11 18:51:27,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1237280.0, ans=0.125 2024-08-11 18:51:31,930 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 18:51:41,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7800, loss[loss=0.09912, beats_loss=0.01195, ecapa_loss=0.0001697, whisper_loss=0.08547, over 22373.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0112, ecapa_loss=0.0001923, whisper_loss=0.09236, over 3863881.78 frames. ], batch size: 92, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:51:47,959 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 18:51:56,221 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 18:51:59,720 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 18:52:29,644 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 18:52:38,633 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 18:52:45,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1237780.0, ans=0.09899494936611666 2024-08-11 18:52:50,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1237780.0, ans=0.0 2024-08-11 18:52:52,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1237780.0, ans=0.125 2024-08-11 18:52:57,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7850, loss[loss=0.09365, beats_loss=0.01163, ecapa_loss=0.0002113, whisper_loss=0.07991, over 21968.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01121, ecapa_loss=0.0001915, whisper_loss=0.09258, over 3873070.53 frames. ], batch size: 91, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:53:01,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1237880.0, ans=0.0 2024-08-11 18:53:05,811 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:53:07,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-11 18:53:13,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-11 18:53:17,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1237980.0, ans=0.125 2024-08-11 18:53:21,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2024-08-11 18:53:22,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1237980.0, ans=0.125 2024-08-11 18:53:24,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.579e+01 2.865e+01 3.320e+01 8.816e+01, threshold=5.729e+01, percent-clipped=1.0 2024-08-11 18:54:13,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7900, loss[loss=0.1091, beats_loss=0.01192, ecapa_loss=0.0001512, whisper_loss=0.09567, over 17805.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0113, ecapa_loss=0.0001921, whisper_loss=0.09253, over 3888792.33 frames. ], batch size: 68, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:54:16,065 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 18:54:36,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2024-08-11 18:54:41,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1238580.0, ans=0.0 2024-08-11 18:54:55,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1238580.0, ans=0.125 2024-08-11 18:55:02,590 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 18:55:18,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-11 18:55:27,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 7950, loss[loss=0.0976, beats_loss=0.01157, ecapa_loss=0.0001929, whisper_loss=0.0841, over 18713.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01128, ecapa_loss=0.0001932, whisper_loss=0.09214, over 3876678.76 frames. ], batch size: 75, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:55:28,693 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 18:55:32,970 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 18:55:40,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1238980.0, ans=0.2 2024-08-11 18:55:44,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1238980.0, ans=0.125 2024-08-11 18:55:50,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1238980.0, ans=0.0 2024-08-11 18:55:52,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.748e+01 3.056e+01 3.459e+01 5.765e+01, threshold=6.112e+01, percent-clipped=1.0 2024-08-11 18:56:04,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-11 18:56:05,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1239080.0, ans=0.0 2024-08-11 18:56:37,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.99 vs. limit=22.5 2024-08-11 18:56:37,565 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8000, loss[loss=0.09301, beats_loss=0.01093, ecapa_loss=0.0001948, whisper_loss=0.08013, over 15662.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01125, ecapa_loss=0.0001922, whisper_loss=0.09208, over 3883376.38 frames. ], batch size: 63, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:56:55,057 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 18:56:59,645 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 18:57:18,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-11 18:57:25,783 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 18:57:48,202 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8050, loss[loss=0.1195, beats_loss=0.008845, ecapa_loss=0.0002707, whisper_loss=0.1079, over 15138.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01128, ecapa_loss=0.0001924, whisper_loss=0.09147, over 3863149.63 frames. ], batch size: 61, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:57:59,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.56 vs. limit=10.0 2024-08-11 18:58:00,129 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-11 18:58:14,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.885e+01 3.265e+01 3.759e+01 1.907e+02, threshold=6.530e+01, percent-clipped=2.0 2024-08-11 18:58:18,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1240080.0, ans=0.2 2024-08-11 18:58:56,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8100, loss[loss=0.1163, beats_loss=0.009228, ecapa_loss=0.000192, whisper_loss=0.1052, over 23146.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01119, ecapa_loss=0.0001908, whisper_loss=0.09219, over 3882423.70 frames. ], batch size: 90, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:59:14,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1240480.0, ans=0.125 2024-08-11 18:59:20,739 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:59:21,644 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 18:59:21,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1240580.0, ans=0.0 2024-08-11 18:59:29,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1240580.0, ans=0.125 2024-08-11 18:59:34,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1240580.0, ans=0.125 2024-08-11 18:59:44,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1240680.0, ans=0.125 2024-08-11 18:59:44,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1240680.0, ans=0.125 2024-08-11 18:59:50,875 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 18:59:52,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-11 18:59:58,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-11 19:00:00,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1240780.0, ans=0.125 2024-08-11 19:00:02,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8150, loss[loss=0.09727, beats_loss=0.01342, ecapa_loss=0.0001569, whisper_loss=0.08228, over 23588.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01115, ecapa_loss=0.0001914, whisper_loss=0.0927, over 3916667.73 frames. ], batch size: 94, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:00:09,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1240880.0, ans=0.125 2024-08-11 19:00:26,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2024-08-11 19:00:26,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.542e+01 2.871e+01 3.241e+01 4.432e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 19:00:52,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-11 19:00:53,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1241180.0, ans=0.125 2024-08-11 19:00:54,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.97 vs. limit=10.0 2024-08-11 19:01:00,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1241280.0, ans=0.125 2024-08-11 19:01:08,781 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8200, loss[loss=0.1364, beats_loss=0.00573, ecapa_loss=0.0002309, whisper_loss=0.1283, over 22246.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.011, ecapa_loss=0.0001935, whisper_loss=0.09418, over 3916332.07 frames. ], batch size: 87, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:01:23,137 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 19:01:25,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1241480.0, ans=0.125 2024-08-11 19:01:32,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1241480.0, ans=0.125 2024-08-11 19:01:42,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1241580.0, ans=0.05 2024-08-11 19:01:46,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1241680.0, ans=0.0 2024-08-11 19:01:49,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1241680.0, ans=0.0 2024-08-11 19:01:57,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1241680.0, ans=0.2 2024-08-11 19:02:04,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1241780.0, ans=0.0 2024-08-11 19:02:06,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1241780.0, ans=0.125 2024-08-11 19:02:09,229 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 19:02:13,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1241880.0, ans=0.2 2024-08-11 19:02:14,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8250, loss[loss=0.1175, beats_loss=0.01218, ecapa_loss=0.0001524, whisper_loss=0.1038, over 22435.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01109, ecapa_loss=0.0001912, whisper_loss=0.09401, over 3936235.20 frames. ], batch size: 87, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:02:17,088 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 19:02:18,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1241880.0, ans=0.2 2024-08-11 19:02:23,660 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 19:02:26,636 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.846e-01 2024-08-11 19:02:29,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1241980.0, ans=0.2 2024-08-11 19:02:37,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.577e+01 2.823e+01 3.231e+01 7.611e+01, threshold=5.645e+01, percent-clipped=2.0 2024-08-11 19:02:41,770 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 19:03:00,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1242180.0, ans=0.1 2024-08-11 19:03:01,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1242180.0, ans=0.125 2024-08-11 19:03:01,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-11 19:03:19,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8300, loss[loss=0.06347, beats_loss=0.01284, ecapa_loss=0.0001995, whisper_loss=0.04864, over 17712.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01115, ecapa_loss=0.0001898, whisper_loss=0.09314, over 3877267.70 frames. ], batch size: 73, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:03:21,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1242380.0, ans=0.125 2024-08-11 19:03:24,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-11 19:03:28,651 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.313e+00 2024-08-11 19:03:35,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1242480.0, ans=0.125 2024-08-11 19:03:52,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1242580.0, ans=0.0 2024-08-11 19:03:55,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1242580.0, ans=0.125 2024-08-11 19:04:03,280 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-11 19:04:23,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-11 19:04:24,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1242880.0, ans=0.125 2024-08-11 19:04:25,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8350, loss[loss=0.09817, beats_loss=0.01123, ecapa_loss=0.0002276, whisper_loss=0.08466, over 20866.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01117, ecapa_loss=0.0001901, whisper_loss=0.09315, over 3893082.49 frames. ], batch size: 88, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:04:43,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2024-08-11 19:04:49,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.806e+01 3.050e+01 3.549e+01 1.399e+02, threshold=6.100e+01, percent-clipped=1.0 2024-08-11 19:05:01,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 19:05:09,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1243180.0, ans=0.125 2024-08-11 19:05:11,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1243180.0, ans=0.1 2024-08-11 19:05:11,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-11 19:05:15,190 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 19:05:26,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-11 19:05:30,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8400, loss[loss=0.08338, beats_loss=0.0116, ecapa_loss=0.0002528, whisper_loss=0.06926, over 14984.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01115, ecapa_loss=0.0001909, whisper_loss=0.09349, over 3912734.05 frames. ], batch size: 63, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:05:31,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1243380.0, ans=0.2 2024-08-11 19:05:31,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2024-08-11 19:05:33,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1243380.0, ans=0.2 2024-08-11 19:05:41,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1243380.0, ans=0.125 2024-08-11 19:05:54,863 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 19:05:55,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1243480.0, ans=0.125 2024-08-11 19:05:59,849 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 19:06:35,827 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 19:06:37,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8450, loss[loss=0.09332, beats_loss=0.01323, ecapa_loss=0.0001855, whisper_loss=0.07823, over 15805.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01115, ecapa_loss=0.0001909, whisper_loss=0.09317, over 3892798.94 frames. ], batch size: 64, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:06:37,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1243880.0, ans=0.125 2024-08-11 19:06:42,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1243880.0, ans=0.125 2024-08-11 19:06:44,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1243880.0, ans=0.0 2024-08-11 19:06:51,688 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 19:07:00,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.505e+01 2.848e+01 3.231e+01 4.188e+01, threshold=5.696e+01, percent-clipped=0.0 2024-08-11 19:07:26,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1244180.0, ans=0.125 2024-08-11 19:07:35,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1244280.0, ans=0.09899494936611666 2024-08-11 19:07:42,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8500, loss[loss=0.1053, beats_loss=0.01158, ecapa_loss=0.0001754, whisper_loss=0.09195, over 21743.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01117, ecapa_loss=0.0001911, whisper_loss=0.09336, over 3910016.41 frames. ], batch size: 87, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:07:43,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1244380.0, ans=0.125 2024-08-11 19:08:05,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-11 19:08:13,175 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 19:08:37,122 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 19:08:41,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1244780.0, ans=0.125 2024-08-11 19:08:46,312 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 19:08:48,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-08-11 19:08:49,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8550, loss[loss=0.1377, beats_loss=0.007348, ecapa_loss=0.0002102, whisper_loss=0.1282, over 15327.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01113, ecapa_loss=0.0001921, whisper_loss=0.09361, over 3898885.76 frames. ], batch size: 58, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:08:52,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1244880.0, ans=0.1 2024-08-11 19:08:58,522 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 19:09:00,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1244880.0, ans=0.125 2024-08-11 19:09:07,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1244980.0, ans=0.0 2024-08-11 19:09:08,155 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 19:09:13,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.649e+01 3.008e+01 3.594e+01 2.630e+02, threshold=6.016e+01, percent-clipped=2.0 2024-08-11 19:09:18,315 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 19:09:29,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1245180.0, ans=0.0 2024-08-11 19:09:31,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=12.0 2024-08-11 19:09:35,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1245180.0, ans=0.0 2024-08-11 19:09:38,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=15.0 2024-08-11 19:09:40,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1245280.0, ans=0.125 2024-08-11 19:09:40,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1245280.0, ans=0.07 2024-08-11 19:09:51,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1245280.0, ans=0.125 2024-08-11 19:09:51,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2024-08-11 19:09:54,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8600, loss[loss=0.1172, beats_loss=0.008552, ecapa_loss=0.000219, whisper_loss=0.1064, over 22754.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01109, ecapa_loss=0.0001918, whisper_loss=0.0947, over 3911979.89 frames. ], batch size: 91, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:10:03,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1245380.0, ans=0.0 2024-08-11 19:10:04,149 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 19:10:09,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1245480.0, ans=0.1 2024-08-11 19:10:12,038 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 19:10:13,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1245480.0, ans=0.125 2024-08-11 19:10:14,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1245480.0, ans=0.125 2024-08-11 19:10:44,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1245680.0, ans=0.125 2024-08-11 19:10:45,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1245680.0, ans=0.1 2024-08-11 19:10:49,326 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 19:10:58,262 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 19:11:01,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8650, loss[loss=0.1185, beats_loss=0.009845, ecapa_loss=0.0002393, whisper_loss=0.1063, over 21355.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01112, ecapa_loss=0.0001914, whisper_loss=0.09451, over 3888648.34 frames. ], batch size: 88, lr: 6.98e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:11:10,354 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 19:11:16,176 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 19:11:24,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1245980.0, ans=0.0 2024-08-11 19:11:24,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2024-08-11 19:11:26,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.702e+01 2.920e+01 3.348e+01 5.833e+01, threshold=5.840e+01, percent-clipped=0.0 2024-08-11 19:11:33,789 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 19:11:49,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1246180.0, ans=0.1 2024-08-11 19:11:51,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1246180.0, ans=0.2 2024-08-11 19:12:04,695 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.285e-01 2024-08-11 19:12:12,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8700, loss[loss=0.1163, beats_loss=0.01041, ecapa_loss=0.0001712, whisper_loss=0.1042, over 22471.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01117, ecapa_loss=0.0001922, whisper_loss=0.09389, over 3883148.36 frames. ], batch size: 86, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:12:25,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1246380.0, ans=0.035 2024-08-11 19:12:27,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1246480.0, ans=0.125 2024-08-11 19:12:35,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.83 vs. limit=22.5 2024-08-11 19:13:17,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1246780.0, ans=0.1 2024-08-11 19:13:24,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1246780.0, ans=0.2 2024-08-11 19:13:31,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8750, loss[loss=0.08961, beats_loss=0.01199, ecapa_loss=0.0001948, whisper_loss=0.07567, over 22847.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01106, ecapa_loss=0.0001932, whisper_loss=0.09423, over 3848274.46 frames. ], batch size: 90, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:13:31,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1246880.0, ans=0.0 2024-08-11 19:13:39,515 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 19:13:48,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1246980.0, ans=0.125 2024-08-11 19:13:53,206 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 19:14:02,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.729e+01 3.149e+01 3.725e+01 7.299e+01, threshold=6.297e+01, percent-clipped=2.0 2024-08-11 19:14:38,954 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 19:14:41,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1247280.0, ans=0.1 2024-08-11 19:14:54,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1247280.0, ans=0.1 2024-08-11 19:14:56,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8800, loss[loss=0.08747, beats_loss=0.01606, ecapa_loss=0.0001212, whisper_loss=0.07019, over 17693.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0112, ecapa_loss=0.0001912, whisper_loss=0.09374, over 3864537.02 frames. ], batch size: 68, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:14:56,818 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 19:15:12,637 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 19:15:30,064 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 19:15:36,977 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 35 from Vox, 26 fro AS 2024-08-11 19:15:40,501 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-11 19:15:46,808 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 19:16:02,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1247680.0, ans=0.2 2024-08-11 19:16:02,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2024-08-11 19:16:05,132 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-11 19:16:06,756 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 19:16:21,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8850, loss[loss=0.1096, beats_loss=0.01236, ecapa_loss=0.000142, whisper_loss=0.09581, over 22812.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01123, ecapa_loss=0.0001901, whisper_loss=0.09332, over 3861146.76 frames. ], batch size: 88, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:16:22,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1247880.0, ans=0.0 2024-08-11 19:16:34,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1247880.0, ans=0.015 2024-08-11 19:16:43,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1247980.0, ans=0.0 2024-08-11 19:16:52,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.673e+01 2.972e+01 3.544e+01 5.278e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-11 19:16:55,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2024-08-11 19:17:14,465 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 19:17:17,848 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 19:17:26,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1248180.0, ans=0.0 2024-08-11 19:17:47,788 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8900, loss[loss=0.1077, beats_loss=0.01314, ecapa_loss=0.0001885, whisper_loss=0.09265, over 15864.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0112, ecapa_loss=0.0001907, whisper_loss=0.09337, over 3851523.26 frames. ], batch size: 64, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:18:00,922 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 19:18:21,912 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 19:18:51,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.36 vs. limit=15.0 2024-08-11 19:19:01,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1248780.0, ans=0.1 2024-08-11 19:19:02,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-11 19:19:12,002 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 19:19:14,947 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 8950, loss[loss=0.09121, beats_loss=0.01197, ecapa_loss=0.000228, whisper_loss=0.07697, over 15527.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01125, ecapa_loss=0.0001906, whisper_loss=0.09257, over 3858365.63 frames. ], batch size: 65, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:19:30,310 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-11 19:19:33,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1248980.0, ans=0.125 2024-08-11 19:19:43,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1248980.0, ans=0.125 2024-08-11 19:19:44,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.588e+01 3.053e+01 3.414e+01 5.392e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-11 19:19:56,468 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 19:19:58,692 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 28 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 19:20:30,851 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 19:20:38,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9000, loss[loss=0.104, beats_loss=0.01201, ecapa_loss=0.0001671, whisper_loss=0.09027, over 23283.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01125, ecapa_loss=0.0001921, whisper_loss=0.0928, over 3866283.00 frames. ], batch size: 92, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:20:38,768 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 19:21:20,515 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on ASR_libri: loss=0.2565, beats_loss=0, ecapa_loss=0.0006239, whisper_loss=0.2503, over 922467.00 frames. 2024-08-11 19:21:39,215 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on SV_voxceleb1: loss=0.005312, beats_loss=0, ecapa_loss=0.0005312, whisper_loss=0, over 939242.00 frames. 2024-08-11 19:23:36,295 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on AT_audioset: loss=0.02491, beats_loss=0.02491, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 19:23:36,298 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 19:23:55,895 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.483e-01 2024-08-11 19:24:07,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=12.0 2024-08-11 19:24:16,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1249580.0, ans=0.125 2024-08-11 19:24:21,727 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 19:24:35,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1249680.0, ans=0.125 2024-08-11 19:24:52,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1249780.0, ans=0.125 2024-08-11 19:25:00,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9050, loss[loss=0.1109, beats_loss=0.009546, ecapa_loss=0.0002124, whisper_loss=0.09925, over 21281.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01121, ecapa_loss=0.0001929, whisper_loss=0.09318, over 3843207.93 frames. ], batch size: 89, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:25:12,352 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 19:25:12,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1249880.0, ans=0.0 2024-08-11 19:25:12,602 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:25:13,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2024-08-11 19:25:32,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.548e+01 2.793e+01 3.280e+01 4.630e+01, threshold=5.586e+01, percent-clipped=0.0 2024-08-11 19:26:08,893 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 19:26:12,643 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 19:26:26,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9100, loss[loss=0.09712, beats_loss=0.01133, ecapa_loss=0.0001482, whisper_loss=0.08431, over 14985.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0112, ecapa_loss=0.0001926, whisper_loss=0.09302, over 3844109.07 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:26:37,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-11 19:26:57,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1250480.0, ans=0.1 2024-08-11 19:27:10,496 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-11 19:27:16,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1250580.0, ans=0.125 2024-08-11 19:27:25,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1250680.0, ans=0.125 2024-08-11 19:27:33,851 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 19:27:52,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9150, loss[loss=0.09346, beats_loss=0.00978, ecapa_loss=0.0002156, whisper_loss=0.08152, over 20268.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01113, ecapa_loss=0.0001942, whisper_loss=0.09372, over 3873237.60 frames. ], batch size: 85, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:27:54,080 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.400e-01 2024-08-11 19:28:22,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.44 vs. limit=15.0 2024-08-11 19:28:23,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.605e+01 2.841e+01 3.221e+01 5.369e+01, threshold=5.683e+01, percent-clipped=0.0 2024-08-11 19:28:29,148 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 19:28:44,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1251180.0, ans=0.125 2024-08-11 19:28:56,333 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 19:28:58,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1251280.0, ans=0.2 2024-08-11 19:29:00,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-08-11 19:29:06,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1251280.0, ans=0.05 2024-08-11 19:29:12,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9200, loss[loss=0.126, beats_loss=0.01127, ecapa_loss=0.0002157, whisper_loss=0.1126, over 21695.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01119, ecapa_loss=0.0001927, whisper_loss=0.09378, over 3902747.36 frames. ], batch size: 89, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:29:29,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1251480.0, ans=0.0 2024-08-11 19:29:52,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1251580.0, ans=0.125 2024-08-11 19:30:28,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9250, loss[loss=0.1033, beats_loss=0.01318, ecapa_loss=0.0001894, whisper_loss=0.08826, over 16458.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01116, ecapa_loss=0.0001939, whisper_loss=0.09363, over 3920230.73 frames. ], batch size: 67, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:30:29,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-08-11 19:30:31,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1251880.0, ans=0.0 2024-08-11 19:30:40,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-08-11 19:30:52,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1251980.0, ans=0.0 2024-08-11 19:30:57,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.691e+01 2.985e+01 3.626e+01 6.428e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 19:30:58,244 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 19:31:15,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-11 19:31:17,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-11 19:31:18,381 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 19:31:27,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2024-08-11 19:31:46,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9300, loss[loss=0.1106, beats_loss=0.01197, ecapa_loss=0.0002089, whisper_loss=0.09651, over 21769.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01115, ecapa_loss=0.0001932, whisper_loss=0.094, over 3921444.42 frames. ], batch size: 92, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:31:49,557 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 19:31:55,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2024-08-11 19:31:58,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1252380.0, ans=0.0 2024-08-11 19:31:59,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1252380.0, ans=0.2 2024-08-11 19:32:28,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2024-08-11 19:32:42,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1252680.0, ans=0.05 2024-08-11 19:32:43,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1252680.0, ans=0.125 2024-08-11 19:32:46,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1252680.0, ans=0.125 2024-08-11 19:32:54,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1252780.0, ans=0.0 2024-08-11 19:33:02,139 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 19:33:05,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9350, loss[loss=0.09531, beats_loss=0.01202, ecapa_loss=0.0001851, whisper_loss=0.08144, over 20391.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01112, ecapa_loss=0.000192, whisper_loss=0.09397, over 3891875.15 frames. ], batch size: 82, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:33:24,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2024-08-11 19:33:32,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1252980.0, ans=0.0 2024-08-11 19:33:35,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.606e+01 3.008e+01 3.444e+01 5.189e+01, threshold=6.015e+01, percent-clipped=1.0 2024-08-11 19:33:35,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1253080.0, ans=0.0 2024-08-11 19:33:43,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1253080.0, ans=0.2 2024-08-11 19:34:09,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.41 vs. limit=10.0 2024-08-11 19:34:12,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-11 19:34:14,993 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 19:34:16,607 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 19:34:18,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1253280.0, ans=0.0 2024-08-11 19:34:22,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9400, loss[loss=0.09946, beats_loss=0.01155, ecapa_loss=0.0002285, whisper_loss=0.08563, over 21781.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01115, ecapa_loss=0.0001926, whisper_loss=0.09345, over 3865685.23 frames. ], batch size: 94, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:34:30,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1253380.0, ans=0.0 2024-08-11 19:34:50,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-11 19:34:50,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2024-08-11 19:35:04,004 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 19:35:19,435 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=15.0 2024-08-11 19:35:22,992 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 19:35:29,640 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-11 19:35:34,266 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 19:35:35,982 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 19:35:37,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9450, loss[loss=0.1092, beats_loss=0.01107, ecapa_loss=0.0001673, whisper_loss=0.09641, over 21434.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01117, ecapa_loss=0.000192, whisper_loss=0.09313, over 3898230.36 frames. ], batch size: 84, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:36:01,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.693e+01 3.099e+01 3.778e+01 6.565e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-11 19:36:06,273 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 19:36:06,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1254080.0, ans=0.0 2024-08-11 19:36:09,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1254080.0, ans=0.125 2024-08-11 19:36:13,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1254080.0, ans=0.0 2024-08-11 19:36:24,100 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:36:41,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1254280.0, ans=0.2 2024-08-11 19:36:43,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9500, loss[loss=0.1028, beats_loss=0.01163, ecapa_loss=0.0001721, whisper_loss=0.08944, over 19609.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01112, ecapa_loss=0.0001926, whisper_loss=0.09394, over 3899824.75 frames. ], batch size: 77, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:36:48,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1254380.0, ans=0.125 2024-08-11 19:37:11,508 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-11 19:37:12,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1254580.0, ans=0.0 2024-08-11 19:37:13,895 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 28 from LS+wenet, 10 from Vox, 19 fro AS 2024-08-11 19:37:25,845 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 19:37:31,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1254680.0, ans=0.125 2024-08-11 19:37:32,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1254680.0, ans=0.0 2024-08-11 19:37:34,941 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 19:37:46,809 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 19:37:49,033 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9550, loss[loss=0.1087, beats_loss=0.01367, ecapa_loss=0.0001541, whisper_loss=0.09352, over 21355.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01115, ecapa_loss=0.0001934, whisper_loss=0.09338, over 3871730.33 frames. ], batch size: 86, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:37:50,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-11 19:37:59,700 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 19:38:05,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1254980.0, ans=0.125 2024-08-11 19:38:10,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2024-08-11 19:38:13,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.508e+01 2.726e+01 3.017e+01 8.338e+01, threshold=5.453e+01, percent-clipped=1.0 2024-08-11 19:38:35,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1255180.0, ans=0.05 2024-08-11 19:38:50,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-11 19:38:53,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1255380.0, ans=10.0 2024-08-11 19:38:53,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1255380.0, ans=0.2 2024-08-11 19:38:54,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9600, loss[loss=0.1038, beats_loss=0.009934, ecapa_loss=0.0001989, whisper_loss=0.09191, over 15328.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01108, ecapa_loss=0.0001929, whisper_loss=0.09356, over 3887801.53 frames. ], batch size: 61, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:38:56,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2024-08-11 19:39:04,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1255380.0, ans=0.0 2024-08-11 19:39:08,298 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 15 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-11 19:39:10,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1255480.0, ans=0.1 2024-08-11 19:39:25,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1255580.0, ans=0.125 2024-08-11 19:39:28,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1255580.0, ans=0.125 2024-08-11 19:39:42,222 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 19:39:51,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1255780.0, ans=0.125 2024-08-11 19:39:57,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2024-08-11 19:39:59,488 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 19:40:02,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9650, loss[loss=0.09242, beats_loss=0.01237, ecapa_loss=0.0001677, whisper_loss=0.07837, over 23002.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01108, ecapa_loss=0.000193, whisper_loss=0.09349, over 3878826.31 frames. ], batch size: 93, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:40:06,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1255880.0, ans=0.0 2024-08-11 19:40:14,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1255980.0, ans=0.0 2024-08-11 19:40:27,787 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.826e+01 3.085e+01 3.592e+01 1.036e+02, threshold=6.169e+01, percent-clipped=1.0 2024-08-11 19:40:35,874 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 19:40:49,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1256180.0, ans=0.1 2024-08-11 19:40:56,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1256280.0, ans=0.2 2024-08-11 19:41:08,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9700, loss[loss=0.115, beats_loss=0.01251, ecapa_loss=0.0001744, whisper_loss=0.1007, over 19331.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01102, ecapa_loss=0.0001934, whisper_loss=0.09386, over 3864118.90 frames. ], batch size: 78, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:41:11,499 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 19:41:13,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-11 19:41:19,506 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 19:41:20,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.63 vs. limit=15.0 2024-08-11 19:41:31,811 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:41:39,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1256580.0, ans=0.05 2024-08-11 19:41:53,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-11 19:42:04,408 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 19:42:14,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9750, loss[loss=0.1031, beats_loss=0.01379, ecapa_loss=0.000155, whisper_loss=0.08781, over 16142.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01122, ecapa_loss=0.0001921, whisper_loss=0.09235, over 3844563.08 frames. ], batch size: 63, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:42:30,338 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:42:40,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.576e+01 2.817e+01 3.279e+01 5.572e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-11 19:42:40,870 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 19:42:47,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1257080.0, ans=0.0 2024-08-11 19:42:48,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1257080.0, ans=0.125 2024-08-11 19:42:50,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2024-08-11 19:43:02,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1257180.0, ans=0.125 2024-08-11 19:43:14,476 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 19:43:21,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9800, loss[loss=0.1121, beats_loss=0.01273, ecapa_loss=0.0001527, whisper_loss=0.09788, over 20462.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01117, ecapa_loss=0.0001923, whisper_loss=0.09247, over 3843048.81 frames. ], batch size: 79, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:43:26,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1257380.0, ans=0.0 2024-08-11 19:43:29,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1257380.0, ans=0.0 2024-08-11 19:43:30,757 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:43:44,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1257480.0, ans=0.1 2024-08-11 19:43:47,149 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 19:43:51,219 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 35 from Vox, 17 fro AS 2024-08-11 19:43:52,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1257580.0, ans=0.1 2024-08-11 19:43:53,575 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 19:44:00,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1257680.0, ans=0.125 2024-08-11 19:44:04,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1257680.0, ans=0.125 2024-08-11 19:44:08,251 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 19:44:16,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1257780.0, ans=0.07 2024-08-11 19:44:19,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1257780.0, ans=0.125 2024-08-11 19:44:21,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1257780.0, ans=0.125 2024-08-11 19:44:24,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1257780.0, ans=0.1 2024-08-11 19:44:26,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9850, loss[loss=0.1249, beats_loss=0.01097, ecapa_loss=0.0001988, whisper_loss=0.112, over 23363.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.0001942, whisper_loss=0.09285, over 3854855.66 frames. ], batch size: 93, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:44:47,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1257980.0, ans=0.0 2024-08-11 19:44:51,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.685e+01 3.037e+01 3.617e+01 4.839e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 19:44:57,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1258080.0, ans=0.04949747468305833 2024-08-11 19:45:04,591 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-11 19:45:08,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-11 19:45:19,263 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 19:45:24,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1258280.0, ans=0.0 2024-08-11 19:45:31,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9900, loss[loss=0.08321, beats_loss=0.01251, ecapa_loss=0.0002299, whisper_loss=0.0684, over 15483.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0111, ecapa_loss=0.0001944, whisper_loss=0.09398, over 3858153.05 frames. ], batch size: 65, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:45:49,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2024-08-11 19:46:00,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1258580.0, ans=0.125 2024-08-11 19:46:06,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1258580.0, ans=0.125 2024-08-11 19:46:23,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1258780.0, ans=0.125 2024-08-11 19:46:33,995 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-11 19:46:34,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1258780.0, ans=0.125 2024-08-11 19:46:36,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 9950, loss[loss=0.1254, beats_loss=0.01084, ecapa_loss=0.0001565, whisper_loss=0.113, over 23867.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01114, ecapa_loss=0.000193, whisper_loss=0.09378, over 3879677.61 frames. ], batch size: 93, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:46:54,162 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 19:47:00,712 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 19:47:01,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.542e+01 2.819e+01 3.280e+01 8.897e+01, threshold=5.637e+01, percent-clipped=1.0 2024-08-11 19:47:23,104 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 19:47:32,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1259280.0, ans=0.0 2024-08-11 19:47:42,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10000, loss[loss=0.09381, beats_loss=0.01354, ecapa_loss=0.0001932, whisper_loss=0.07834, over 21265.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0112, ecapa_loss=0.0001923, whisper_loss=0.09271, over 3861492.68 frames. ], batch size: 88, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:47:55,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.57 vs. limit=22.5 2024-08-11 19:47:56,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1259480.0, ans=0.125 2024-08-11 19:48:08,028 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-11 19:48:10,700 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 19:48:12,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1259580.0, ans=0.07 2024-08-11 19:48:16,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1259580.0, ans=0.1 2024-08-11 19:48:25,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1259680.0, ans=0.125 2024-08-11 19:48:32,360 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-11 19:48:47,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10050, loss[loss=0.08732, beats_loss=0.01061, ecapa_loss=0.0001866, whisper_loss=0.07484, over 19537.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01118, ecapa_loss=0.0001917, whisper_loss=0.09264, over 3863256.25 frames. ], batch size: 79, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:48:59,417 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 19:49:02,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1259980.0, ans=0.0 2024-08-11 19:49:12,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.712e+01 3.023e+01 3.510e+01 5.543e+01, threshold=6.045e+01, percent-clipped=0.0 2024-08-11 19:49:14,068 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-11 19:49:17,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1260080.0, ans=0.125 2024-08-11 19:49:27,148 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 19:49:29,657 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 19:49:33,263 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 19:49:34,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1260180.0, ans=0.125 2024-08-11 19:49:35,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.42 vs. limit=10.0 2024-08-11 19:49:38,726 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 19:49:40,317 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 19:49:45,978 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.028e+05 2024-08-11 19:49:46,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-11 19:49:46,912 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 19:49:49,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1260280.0, ans=0.125 2024-08-11 19:49:52,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10100, loss[loss=0.1326, beats_loss=0.009445, ecapa_loss=0.0002602, whisper_loss=0.1206, over 22072.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01118, ecapa_loss=0.0001925, whisper_loss=0.09268, over 3888412.27 frames. ], batch size: 90, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:49:57,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1260380.0, ans=0.125 2024-08-11 19:50:40,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1260680.0, ans=0.0 2024-08-11 19:50:44,471 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 19:50:58,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10150, loss[loss=0.08714, beats_loss=0.014, ecapa_loss=0.0001624, whisper_loss=0.07151, over 22204.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01121, ecapa_loss=0.0001929, whisper_loss=0.09278, over 3902923.62 frames. ], batch size: 90, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:51:23,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.653e+01 2.999e+01 3.558e+01 5.617e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 19:51:31,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1261080.0, ans=0.0 2024-08-11 19:52:00,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1261280.0, ans=0.1 2024-08-11 19:52:03,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10200, loss[loss=0.09836, beats_loss=0.01359, ecapa_loss=0.0001807, whisper_loss=0.08296, over 19114.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01114, ecapa_loss=0.000193, whisper_loss=0.09332, over 3874517.63 frames. ], batch size: 78, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:52:09,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1261380.0, ans=0.1 2024-08-11 19:52:33,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-11 19:52:43,716 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 19:52:47,519 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 19:53:00,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1261780.0, ans=0.125 2024-08-11 19:53:07,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1261780.0, ans=0.0 2024-08-11 19:53:07,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1261780.0, ans=0.125 2024-08-11 19:53:09,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10250, loss[loss=0.1276, beats_loss=0.007429, ecapa_loss=0.0001732, whisper_loss=0.1184, over 15358.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01112, ecapa_loss=0.0001932, whisper_loss=0.09366, over 3889120.48 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:53:17,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2024-08-11 19:53:33,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.624e+01 2.927e+01 3.242e+01 1.065e+02, threshold=5.855e+01, percent-clipped=3.0 2024-08-11 19:53:43,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1262080.0, ans=0.125 2024-08-11 19:53:53,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1262180.0, ans=0.125 2024-08-11 19:54:10,408 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 19:54:15,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10300, loss[loss=0.1077, beats_loss=0.01156, ecapa_loss=0.0002428, whisper_loss=0.09369, over 14384.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01115, ecapa_loss=0.0001936, whisper_loss=0.09347, over 3865921.15 frames. ], batch size: 59, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:54:22,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1262380.0, ans=0.0 2024-08-11 19:54:45,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1262580.0, ans=0.2 2024-08-11 19:54:59,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2024-08-11 19:55:01,638 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 19:55:11,681 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 19:55:14,679 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 19:55:20,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=1262880.0, ans=0.2 2024-08-11 19:55:20,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10350, loss[loss=0.09929, beats_loss=0.01044, ecapa_loss=0.0002112, whisper_loss=0.08674, over 20477.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01127, ecapa_loss=0.000192, whisper_loss=0.09297, over 3880284.80 frames. ], batch size: 85, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:55:26,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1262880.0, ans=0.0 2024-08-11 19:55:27,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1262880.0, ans=0.09899494936611666 2024-08-11 19:55:31,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1262880.0, ans=0.125 2024-08-11 19:55:36,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2024-08-11 19:55:45,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.735e+01 3.032e+01 3.459e+01 9.732e+01, threshold=6.064e+01, percent-clipped=1.0 2024-08-11 19:55:48,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1263080.0, ans=0.125 2024-08-11 19:55:58,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.24 vs. limit=22.5 2024-08-11 19:56:00,485 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 19:56:04,755 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 19:56:26,476 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10400, loss[loss=0.1029, beats_loss=0.01295, ecapa_loss=0.0001844, whisper_loss=0.08811, over 18434.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0113, ecapa_loss=0.0001905, whisper_loss=0.09281, over 3907165.55 frames. ], batch size: 75, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:56:33,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1263380.0, ans=0.1 2024-08-11 19:56:34,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1263380.0, ans=0.0 2024-08-11 19:56:37,208 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 19:56:44,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1263480.0, ans=0.125 2024-08-11 19:56:56,426 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 19:57:06,924 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 19:57:14,413 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 19:57:21,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1263780.0, ans=0.2 2024-08-11 19:57:22,811 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 19:57:31,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10450, loss[loss=0.1175, beats_loss=0.01128, ecapa_loss=0.0001859, whisper_loss=0.1043, over 20211.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01127, ecapa_loss=0.0001902, whisper_loss=0.0925, over 3900561.71 frames. ], batch size: 80, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:57:32,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2024-08-11 19:57:34,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1263880.0, ans=0.2 2024-08-11 19:57:36,095 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 19:57:36,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.14 vs. limit=22.5 2024-08-11 19:57:40,041 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 19:57:56,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.575e+01 2.883e+01 3.290e+01 7.177e+01, threshold=5.767e+01, percent-clipped=1.0 2024-08-11 19:58:02,224 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 19:58:02,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1264080.0, ans=0.125 2024-08-11 19:58:08,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1264080.0, ans=0.0 2024-08-11 19:58:15,151 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 19:58:32,375 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.646e-02 2024-08-11 19:58:36,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10500, loss[loss=0.09562, beats_loss=0.01117, ecapa_loss=0.0001701, whisper_loss=0.08276, over 22585.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01128, ecapa_loss=0.0001896, whisper_loss=0.09206, over 3901988.79 frames. ], batch size: 90, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:58:44,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1264380.0, ans=0.125 2024-08-11 19:58:45,187 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-11 19:58:45,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1264380.0, ans=0.0 2024-08-11 19:59:07,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=15.0 2024-08-11 19:59:13,173 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.143e+02 2024-08-11 19:59:22,470 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-11 19:59:28,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1264680.0, ans=0.04949747468305833 2024-08-11 19:59:30,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1264780.0, ans=0.0 2024-08-11 19:59:36,972 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 19:59:38,515 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 19:59:39,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1264780.0, ans=0.125 2024-08-11 19:59:43,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10550, loss[loss=0.1019, beats_loss=0.01353, ecapa_loss=0.0001891, whisper_loss=0.0865, over 21775.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01134, ecapa_loss=0.0001901, whisper_loss=0.09163, over 3857076.27 frames. ], batch size: 92, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:59:50,141 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 20:00:00,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=12.0 2024-08-11 20:00:08,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.610e+01 2.840e+01 3.443e+01 6.303e+01, threshold=5.679e+01, percent-clipped=1.0 2024-08-11 20:00:16,545 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 20:00:19,008 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 20:00:39,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1265280.0, ans=0.125 2024-08-11 20:00:49,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10600, loss[loss=0.1064, beats_loss=0.008727, ecapa_loss=0.000216, whisper_loss=0.09556, over 21031.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01129, ecapa_loss=0.0001913, whisper_loss=0.09151, over 3871952.24 frames. ], batch size: 84, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:00:57,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.56 vs. limit=22.5 2024-08-11 20:01:09,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1265480.0, ans=0.09899494936611666 2024-08-11 20:01:21,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1265580.0, ans=0.2 2024-08-11 20:01:41,618 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 20:01:46,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-11 20:01:54,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1265880.0, ans=0.125 2024-08-11 20:01:54,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1265880.0, ans=0.0 2024-08-11 20:01:55,631 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10650, loss[loss=0.09863, beats_loss=0.01094, ecapa_loss=0.0001678, whisper_loss=0.08602, over 17422.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01127, ecapa_loss=0.0001895, whisper_loss=0.09221, over 3882363.11 frames. ], batch size: 70, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:02:05,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1265880.0, ans=0.1 2024-08-11 20:02:09,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1265980.0, ans=0.1 2024-08-11 20:02:21,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.841e+01 3.157e+01 3.812e+01 6.518e+01, threshold=6.314e+01, percent-clipped=4.0 2024-08-11 20:02:27,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-11 20:02:34,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1266180.0, ans=0.0 2024-08-11 20:02:45,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-08-11 20:02:45,616 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 20:02:50,868 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 20:02:52,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1266280.0, ans=0.125 2024-08-11 20:03:01,651 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 35 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 20:03:02,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10700, loss[loss=0.1424, beats_loss=0.007588, ecapa_loss=0.0001788, whisper_loss=0.133, over 21641.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01118, ecapa_loss=0.0001885, whisper_loss=0.09305, over 3867934.99 frames. ], batch size: 78, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:03:07,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1266380.0, ans=0.125 2024-08-11 20:03:12,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1266380.0, ans=0.125 2024-08-11 20:03:21,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1266480.0, ans=0.125 2024-08-11 20:03:21,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1266480.0, ans=0.125 2024-08-11 20:03:27,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1266480.0, ans=0.0 2024-08-11 20:03:31,189 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 7 from Vox, 37 fro AS 2024-08-11 20:03:37,991 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 20:03:38,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1266580.0, ans=0.0 2024-08-11 20:03:46,202 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 20:03:58,739 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 20:04:05,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1266780.0, ans=0.125 2024-08-11 20:04:07,246 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 20:04:09,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10750, loss[loss=0.0872, beats_loss=0.01202, ecapa_loss=0.0001865, whisper_loss=0.07331, over 15344.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01105, ecapa_loss=0.0001896, whisper_loss=0.09414, over 3865962.47 frames. ], batch size: 64, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:04:10,169 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:04:10,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1266880.0, ans=0.1 2024-08-11 20:04:18,933 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 20:04:27,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1266980.0, ans=0.0 2024-08-11 20:04:36,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.628e+01 2.928e+01 3.321e+01 7.388e+01, threshold=5.856e+01, percent-clipped=1.0 2024-08-11 20:04:45,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1267080.0, ans=0.125 2024-08-11 20:04:50,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-11 20:04:51,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1267180.0, ans=0.0 2024-08-11 20:04:52,185 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 20:04:57,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1267180.0, ans=0.125 2024-08-11 20:05:05,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1267280.0, ans=0.125 2024-08-11 20:05:19,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10800, loss[loss=0.1192, beats_loss=0.008372, ecapa_loss=0.0002306, whisper_loss=0.1085, over 21357.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01122, ecapa_loss=0.0001889, whisper_loss=0.09375, over 3862510.52 frames. ], batch size: 82, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:05:22,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1267380.0, ans=0.125 2024-08-11 20:05:23,030 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 20:05:23,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1267380.0, ans=0.125 2024-08-11 20:05:23,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1267380.0, ans=0.125 2024-08-11 20:05:24,239 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 20:05:42,293 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 20:05:42,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1267480.0, ans=0.2 2024-08-11 20:05:50,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1267580.0, ans=0.125 2024-08-11 20:05:56,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1267580.0, ans=0.2 2024-08-11 20:06:03,082 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 20:06:24,613 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 20:06:32,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1267780.0, ans=0.125 2024-08-11 20:06:35,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10850, loss[loss=0.09547, beats_loss=0.01035, ecapa_loss=0.0002295, whisper_loss=0.08282, over 17460.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01114, ecapa_loss=0.0001901, whisper_loss=0.09408, over 3873027.14 frames. ], batch size: 68, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:06:51,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-11 20:07:03,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1267980.0, ans=0.0 2024-08-11 20:07:05,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.647e+01 2.915e+01 3.241e+01 5.191e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-11 20:07:15,385 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 20:07:19,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1268080.0, ans=0.0 2024-08-11 20:07:48,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1268280.0, ans=0.0 2024-08-11 20:07:50,192 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.609e-03 2024-08-11 20:07:53,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10900, loss[loss=0.09961, beats_loss=0.0116, ecapa_loss=0.0001809, whisper_loss=0.08621, over 15135.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01111, ecapa_loss=0.0001896, whisper_loss=0.09404, over 3867961.23 frames. ], batch size: 59, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:08:05,484 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 20:08:10,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1268380.0, ans=0.95 2024-08-11 20:08:36,708 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.054e-01 2024-08-11 20:08:43,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.23 vs. limit=15.0 2024-08-11 20:08:57,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1268680.0, ans=0.125 2024-08-11 20:09:13,637 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 26 from Vox, 18 fro AS 2024-08-11 20:09:19,567 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 10950, loss[loss=0.111, beats_loss=0.01223, ecapa_loss=0.0002007, whisper_loss=0.09672, over 19920.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01116, ecapa_loss=0.0001894, whisper_loss=0.09375, over 3897753.57 frames. ], batch size: 81, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:09:42,790 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 20:09:43,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1268980.0, ans=0.2 2024-08-11 20:09:57,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1268980.0, ans=0.0 2024-08-11 20:09:57,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1268980.0, ans=0.0 2024-08-11 20:10:01,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.627e+01 3.007e+01 3.464e+01 1.236e+02, threshold=6.014e+01, percent-clipped=3.0 2024-08-11 20:10:26,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2024-08-11 20:11:08,355 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11000, loss[loss=0.1194, beats_loss=0.009665, ecapa_loss=0.0002142, whisper_loss=0.1076, over 21181.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01115, ecapa_loss=0.0001926, whisper_loss=0.09333, over 3915576.36 frames. ], batch size: 86, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:11:18,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1269380.0, ans=0.125 2024-08-11 20:11:21,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1269380.0, ans=0.0 2024-08-11 20:11:37,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=12.0 2024-08-11 20:11:40,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1269480.0, ans=0.95 2024-08-11 20:11:43,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1269480.0, ans=0.125 2024-08-11 20:11:43,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1269480.0, ans=0.125 2024-08-11 20:11:46,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1269580.0, ans=0.0 2024-08-11 20:11:51,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1269580.0, ans=0.0 2024-08-11 20:12:00,586 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 20:12:32,277 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 20:12:32,869 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:12:49,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11050, loss[loss=0.1191, beats_loss=0.009556, ecapa_loss=0.0001664, whisper_loss=0.1079, over 21133.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01114, ecapa_loss=0.0001938, whisper_loss=0.09385, over 3921201.17 frames. ], batch size: 78, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:13:01,765 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 20:13:07,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1269880.0, ans=0.0 2024-08-11 20:13:11,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1269980.0, ans=0.09899494936611666 2024-08-11 20:13:15,400 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 20:13:22,913 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 20:13:29,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1269980.0, ans=0.2 2024-08-11 20:13:33,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1269980.0, ans=0.0 2024-08-11 20:13:33,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.512e+01 2.845e+01 3.437e+01 6.269e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-11 20:13:42,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-11 20:13:49,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1270080.0, ans=0.125 2024-08-11 20:14:03,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1270180.0, ans=0.07 2024-08-11 20:14:35,202 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 20:14:46,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11100, loss[loss=0.08962, beats_loss=0.01105, ecapa_loss=0.0002273, whisper_loss=0.07629, over 19344.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01119, ecapa_loss=0.000194, whisper_loss=0.09325, over 3900696.75 frames. ], batch size: 81, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:15:13,210 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 20:15:27,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1270480.0, ans=0.125 2024-08-11 20:15:51,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1270580.0, ans=0.2 2024-08-11 20:15:59,714 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 20:16:00,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1270680.0, ans=0.2 2024-08-11 20:16:46,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1270880.0, ans=0.0 2024-08-11 20:16:47,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11150, loss[loss=0.1157, beats_loss=0.01082, ecapa_loss=0.0001867, whisper_loss=0.103, over 20362.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01105, ecapa_loss=0.0001928, whisper_loss=0.09339, over 3901743.72 frames. ], batch size: 79, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:17:01,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1270880.0, ans=0.125 2024-08-11 20:17:15,684 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 20:17:32,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1270980.0, ans=0.2 2024-08-11 20:17:39,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.507e+01 2.811e+01 3.221e+01 4.609e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-11 20:17:53,051 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 20:17:55,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1271080.0, ans=0.125 2024-08-11 20:17:55,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1271080.0, ans=0.2 2024-08-11 20:18:24,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1271280.0, ans=0.125 2024-08-11 20:18:29,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1271280.0, ans=0.2 2024-08-11 20:18:32,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1271280.0, ans=0.0 2024-08-11 20:18:36,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11200, loss[loss=0.09245, beats_loss=0.01227, ecapa_loss=0.0001649, whisper_loss=0.07852, over 21054.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0001922, whisper_loss=0.09297, over 3893541.51 frames. ], batch size: 87, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:18:45,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2024-08-11 20:19:09,179 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 20:19:12,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1271580.0, ans=0.07 2024-08-11 20:19:18,226 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 20:19:24,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1271580.0, ans=0.125 2024-08-11 20:19:33,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=32.66 vs. limit=22.5 2024-08-11 20:20:05,085 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11250, loss[loss=0.1062, beats_loss=0.00975, ecapa_loss=0.0001903, whisper_loss=0.09456, over 18618.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.0001925, whisper_loss=0.09313, over 3894523.65 frames. ], batch size: 72, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:20:21,315 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 20:20:37,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1271980.0, ans=0.125 2024-08-11 20:20:38,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.603e+01 2.926e+01 3.414e+01 6.111e+01, threshold=5.851e+01, percent-clipped=1.0 2024-08-11 20:20:50,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1272080.0, ans=0.1 2024-08-11 20:21:03,928 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 20:21:16,293 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 20:21:17,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-11 20:21:31,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1272280.0, ans=0.1 2024-08-11 20:21:34,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11300, loss[loss=0.1093, beats_loss=0.01102, ecapa_loss=0.0002203, whisper_loss=0.09608, over 17752.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01113, ecapa_loss=0.0001906, whisper_loss=0.09309, over 3881424.94 frames. ], batch size: 72, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:22:04,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2024-08-11 20:22:27,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1272580.0, ans=0.0 2024-08-11 20:22:43,155 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 20:23:04,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11350, loss[loss=0.1003, beats_loss=0.01295, ecapa_loss=0.0001318, whisper_loss=0.08598, over 20139.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01118, ecapa_loss=0.0001898, whisper_loss=0.09259, over 3870766.66 frames. ], batch size: 77, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:23:13,935 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 20:23:26,865 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 20:23:34,691 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 20:23:35,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.62 vs. limit=22.5 2024-08-11 20:23:39,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.545e+01 2.892e+01 3.550e+01 1.179e+02, threshold=5.785e+01, percent-clipped=1.0 2024-08-11 20:23:42,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1273080.0, ans=0.125 2024-08-11 20:24:01,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1273180.0, ans=0.1 2024-08-11 20:24:23,923 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 20:24:35,040 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11400, loss[loss=0.09997, beats_loss=0.01007, ecapa_loss=0.0002339, whisper_loss=0.08756, over 15234.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01114, ecapa_loss=0.0001895, whisper_loss=0.09379, over 3864875.04 frames. ], batch size: 62, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:24:35,572 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 20:24:40,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1273380.0, ans=0.0 2024-08-11 20:24:40,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-08-11 20:25:11,382 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 20:25:11,873 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.460e-01 2024-08-11 20:25:27,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1273680.0, ans=0.125 2024-08-11 20:25:32,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=12.0 2024-08-11 20:25:35,548 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 20:25:36,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2024-08-11 20:26:03,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11450, loss[loss=0.1098, beats_loss=0.01213, ecapa_loss=0.000186, whisper_loss=0.09577, over 19148.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01119, ecapa_loss=0.0001902, whisper_loss=0.09373, over 3889199.87 frames. ], batch size: 75, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:26:11,498 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 20:26:21,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1273980.0, ans=0.0 2024-08-11 20:26:38,193 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.725e+01 3.153e+01 3.598e+01 9.857e+01, threshold=6.305e+01, percent-clipped=2.0 2024-08-11 20:26:44,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1274080.0, ans=0.05 2024-08-11 20:27:33,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11500, loss[loss=0.1087, beats_loss=0.009279, ecapa_loss=0.0002326, whisper_loss=0.0971, over 13959.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.0001898, whisper_loss=0.09294, over 3880884.08 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:27:55,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1274480.0, ans=0.07 2024-08-11 20:27:55,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=12.0 2024-08-11 20:28:02,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.72 vs. limit=15.0 2024-08-11 20:28:06,442 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 20:28:13,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=15.0 2024-08-11 20:28:24,546 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 20:28:44,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=12.0 2024-08-11 20:29:04,096 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 20:29:07,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11550, loss[loss=0.09241, beats_loss=0.01204, ecapa_loss=0.0001667, whisper_loss=0.0787, over 21027.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01126, ecapa_loss=0.0001899, whisper_loss=0.0931, over 3885309.54 frames. ], batch size: 81, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:29:12,522 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 20:29:27,134 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 20:29:33,540 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.709e-01 2024-08-11 20:29:35,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1274980.0, ans=0.125 2024-08-11 20:29:43,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.690e+01 2.944e+01 3.463e+01 4.757e+01, threshold=5.888e+01, percent-clipped=0.0 2024-08-11 20:29:59,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1275080.0, ans=0.2 2024-08-11 20:30:02,772 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 20:30:14,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1275180.0, ans=0.2 2024-08-11 20:30:37,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11600, loss[loss=0.1021, beats_loss=0.009481, ecapa_loss=0.0002486, whisper_loss=0.09018, over 15781.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0112, ecapa_loss=0.0001916, whisper_loss=0.09281, over 3882098.43 frames. ], batch size: 65, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:30:38,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1275380.0, ans=0.1 2024-08-11 20:30:50,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-11 20:31:26,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1275580.0, ans=0.125 2024-08-11 20:31:58,536 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.952e+02 2024-08-11 20:32:06,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11650, loss[loss=0.1158, beats_loss=0.009159, ecapa_loss=0.0002323, whisper_loss=0.1043, over 17590.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.0001914, whisper_loss=0.09308, over 3911549.51 frames. ], batch size: 75, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:32:44,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.564e+01 2.809e+01 3.170e+01 4.570e+01, threshold=5.617e+01, percent-clipped=0.0 2024-08-11 20:33:42,163 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 20:33:43,268 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11700, loss[loss=0.08259, beats_loss=0.01383, ecapa_loss=0.0002122, whisper_loss=0.06664, over 15988.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01122, ecapa_loss=0.0001911, whisper_loss=0.09343, over 3932204.91 frames. ], batch size: 66, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:33:53,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1276380.0, ans=0.2 2024-08-11 20:33:56,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1276380.0, ans=0.125 2024-08-11 20:33:58,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1276380.0, ans=0.0 2024-08-11 20:34:32,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1276580.0, ans=0.2 2024-08-11 20:34:42,455 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:34:45,585 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 20:34:58,507 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 20:35:08,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1276780.0, ans=0.125 2024-08-11 20:35:14,984 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11750, loss[loss=0.0853, beats_loss=0.01301, ecapa_loss=0.0001528, whisper_loss=0.07077, over 16511.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01123, ecapa_loss=0.0001923, whisper_loss=0.09335, over 3919896.76 frames. ], batch size: 66, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:35:19,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1276880.0, ans=0.0 2024-08-11 20:35:21,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1276880.0, ans=0.09899494936611666 2024-08-11 20:35:49,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.651e+01 2.904e+01 3.391e+01 1.042e+02, threshold=5.808e+01, percent-clipped=2.0 2024-08-11 20:35:55,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1277080.0, ans=0.1 2024-08-11 20:36:36,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-08-11 20:36:37,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1277280.0, ans=0.0 2024-08-11 20:36:43,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11800, loss[loss=0.09911, beats_loss=0.01254, ecapa_loss=0.0001855, whisper_loss=0.08471, over 17756.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01127, ecapa_loss=0.0001916, whisper_loss=0.09316, over 3916243.79 frames. ], batch size: 74, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:36:51,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1277380.0, ans=0.0 2024-08-11 20:36:53,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1277380.0, ans=0.0 2024-08-11 20:36:54,134 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 20:37:13,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1277480.0, ans=0.125 2024-08-11 20:37:20,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=12.0 2024-08-11 20:37:24,117 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 20:37:32,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1277580.0, ans=0.025 2024-08-11 20:37:48,130 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-11 20:38:12,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11850, loss[loss=0.1173, beats_loss=0.009647, ecapa_loss=0.0002316, whisper_loss=0.1053, over 15625.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0001914, whisper_loss=0.09312, over 3933116.57 frames. ], batch size: 64, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:38:13,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1277880.0, ans=0.0 2024-08-11 20:38:36,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=15.0 2024-08-11 20:38:36,780 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 20:38:43,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.625e+01 2.967e+01 3.340e+01 5.309e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 20:38:55,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1278080.0, ans=0.1 2024-08-11 20:39:11,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1278180.0, ans=0.1 2024-08-11 20:39:30,814 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 20:39:38,122 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11900, loss[loss=0.1169, beats_loss=0.009259, ecapa_loss=0.0002242, whisper_loss=0.1054, over 19756.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01126, ecapa_loss=0.0001899, whisper_loss=0.0928, over 3935154.60 frames. ], batch size: 83, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:39:39,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1278380.0, ans=0.2 2024-08-11 20:39:50,330 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 22 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-11 20:40:02,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1278480.0, ans=0.125 2024-08-11 20:40:14,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1278580.0, ans=0.0 2024-08-11 20:40:17,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1278580.0, ans=0.2 2024-08-11 20:40:24,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1278580.0, ans=0.0 2024-08-11 20:40:43,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1278680.0, ans=0.125 2024-08-11 20:41:03,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 11950, loss[loss=0.1089, beats_loss=0.01175, ecapa_loss=0.0001925, whisper_loss=0.09524, over 21406.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001915, whisper_loss=0.09308, over 3898462.57 frames. ], batch size: 87, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:41:16,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1278880.0, ans=0.0 2024-08-11 20:41:37,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.570e+01 2.836e+01 3.237e+01 6.228e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-11 20:42:06,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1279180.0, ans=0.2 2024-08-11 20:42:15,728 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 20:42:16,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1279280.0, ans=0.0 2024-08-11 20:42:33,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12000, loss[loss=0.1056, beats_loss=0.01124, ecapa_loss=0.0001327, whisper_loss=0.09307, over 15376.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01122, ecapa_loss=0.0001904, whisper_loss=0.09291, over 3891575.54 frames. ], batch size: 57, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:42:33,194 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 20:43:16,174 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0006123, whisper_loss=0.25, over 922467.00 frames. 2024-08-11 20:43:35,217 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on SV_voxceleb1: loss=0.005094, beats_loss=0, ecapa_loss=0.0005094, whisper_loss=0, over 939242.00 frames. 2024-08-11 20:44:37,716 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.5015, 1.7977, 1.3181, 1.2651, 1.2714, 1.2469, 1.5866, 1.4143], device='cuda:3') 2024-08-11 20:45:30,593 INFO [train_multi_KD3.py:1149] (3/4) Epoch 9, validation on AT_audioset: loss=0.02487, beats_loss=0.02487, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 20:45:30,597 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 20:45:33,397 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 20:45:44,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1279380.0, ans=0.125 2024-08-11 20:45:51,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1279480.0, ans=0.0 2024-08-11 20:45:56,662 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 20:46:23,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1279680.0, ans=0.0 2024-08-11 20:46:58,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1279880.0, ans=0.125 2024-08-11 20:46:59,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12050, loss[loss=0.1053, beats_loss=0.01261, ecapa_loss=0.0001955, whisper_loss=0.09073, over 22500.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.0001905, whisper_loss=0.09291, over 3891668.06 frames. ], batch size: 91, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:47:01,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1279880.0, ans=0.125 2024-08-11 20:47:01,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1279880.0, ans=0.0 2024-08-11 20:47:08,401 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 20:47:23,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1279980.0, ans=0.125 2024-08-11 20:47:32,929 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.712e+01 3.113e+01 3.609e+01 6.588e+01, threshold=6.227e+01, percent-clipped=3.0 2024-08-11 20:47:39,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1280080.0, ans=0.2 2024-08-11 20:47:58,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1280180.0, ans=0.0 2024-08-11 20:48:03,373 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 20:48:18,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1280280.0, ans=0.0 2024-08-11 20:48:27,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12100, loss[loss=0.09519, beats_loss=0.01296, ecapa_loss=0.0001808, whisper_loss=0.08042, over 22262.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01132, ecapa_loss=0.0001901, whisper_loss=0.09234, over 3875094.32 frames. ], batch size: 94, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:48:36,421 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 20:48:39,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-08-11 20:48:41,536 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 20:48:52,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-11 20:48:58,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1280480.0, ans=0.125 2024-08-11 20:49:00,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1280480.0, ans=0.125 2024-08-11 20:49:08,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1280580.0, ans=0.2 2024-08-11 20:49:10,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2024-08-11 20:49:19,320 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-11 20:49:52,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1280780.0, ans=15.0 2024-08-11 20:49:55,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12150, loss[loss=0.09054, beats_loss=0.01447, ecapa_loss=0.0001575, whisper_loss=0.07449, over 22388.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01132, ecapa_loss=0.0001905, whisper_loss=0.09212, over 3884814.23 frames. ], batch size: 90, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:50:06,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1280880.0, ans=0.125 2024-08-11 20:50:09,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1280880.0, ans=0.1 2024-08-11 20:50:23,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1280980.0, ans=0.2 2024-08-11 20:50:26,560 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.646e+01 3.020e+01 3.424e+01 5.278e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 20:50:42,046 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 20:50:54,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1281180.0, ans=0.2 2024-08-11 20:50:59,085 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 20:51:08,102 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.448e+00 2024-08-11 20:51:19,634 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12200, loss[loss=0.1265, beats_loss=0.009501, ecapa_loss=0.0001778, whisper_loss=0.1152, over 19658.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01128, ecapa_loss=0.0001901, whisper_loss=0.09276, over 3900145.01 frames. ], batch size: 77, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:51:19,902 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 20:51:25,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1281380.0, ans=0.025 2024-08-11 20:51:36,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281480.0, ans=0.1 2024-08-11 20:51:52,382 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 20:51:53,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2024-08-11 20:52:04,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=15.0 2024-08-11 20:52:21,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1281680.0, ans=0.2 2024-08-11 20:52:43,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12250, loss[loss=0.1145, beats_loss=0.01087, ecapa_loss=0.0001933, whisper_loss=0.1017, over 22804.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01126, ecapa_loss=0.0001898, whisper_loss=0.0929, over 3891872.51 frames. ], batch size: 92, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:52:48,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1281880.0, ans=0.125 2024-08-11 20:52:53,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1281880.0, ans=0.125 2024-08-11 20:53:16,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.578e+01 2.932e+01 3.420e+01 1.649e+02, threshold=5.864e+01, percent-clipped=1.0 2024-08-11 20:53:16,276 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 20:53:23,742 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 20:53:28,835 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 20:53:29,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1282080.0, ans=0.1 2024-08-11 20:53:31,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1282080.0, ans=0.125 2024-08-11 20:54:05,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-11 20:54:08,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12300, loss[loss=0.1078, beats_loss=0.01258, ecapa_loss=0.0001882, whisper_loss=0.09338, over 23228.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01131, ecapa_loss=0.0001881, whisper_loss=0.09294, over 3879522.98 frames. ], batch size: 95, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:54:32,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1282480.0, ans=0.0 2024-08-11 20:54:36,547 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 20:54:51,029 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 20:54:54,511 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-11 20:54:56,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-08-11 20:54:59,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1282580.0, ans=0.025 2024-08-11 20:55:21,092 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:55:28,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1282780.0, ans=0.2 2024-08-11 20:55:30,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1282780.0, ans=0.125 2024-08-11 20:55:30,791 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.075e+00 2024-08-11 20:55:35,019 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12350, loss[loss=0.1211, beats_loss=0.01086, ecapa_loss=0.0001652, whisper_loss=0.1086, over 22239.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001916, whisper_loss=0.09307, over 3879142.22 frames. ], batch size: 89, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:55:44,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1282880.0, ans=0.125 2024-08-11 20:55:54,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1282980.0, ans=0.1 2024-08-11 20:55:57,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1282980.0, ans=0.125 2024-08-11 20:56:05,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1282980.0, ans=0.025 2024-08-11 20:56:06,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.524e+01 2.954e+01 3.299e+01 5.655e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 20:56:15,205 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 20:56:28,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1283180.0, ans=0.125 2024-08-11 20:56:59,140 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 20:57:01,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12400, loss[loss=0.0858, beats_loss=0.01265, ecapa_loss=0.0001427, whisper_loss=0.07173, over 16588.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01113, ecapa_loss=0.0001915, whisper_loss=0.09373, over 3858444.70 frames. ], batch size: 61, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:57:03,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1283380.0, ans=0.2 2024-08-11 20:57:12,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1283380.0, ans=0.125 2024-08-11 20:57:21,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2024-08-11 20:57:37,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1283580.0, ans=0.0 2024-08-11 20:58:04,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1283680.0, ans=0.125 2024-08-11 20:58:06,318 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 20:58:20,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1283780.0, ans=0.125 2024-08-11 20:58:21,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1283780.0, ans=0.125 2024-08-11 20:58:23,383 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 20:58:25,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12450, loss[loss=0.1207, beats_loss=0.01046, ecapa_loss=0.0001728, whisper_loss=0.1085, over 22594.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01111, ecapa_loss=0.0001919, whisper_loss=0.09372, over 3898098.48 frames. ], batch size: 91, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:58:35,824 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-11 20:58:42,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1283980.0, ans=0.2 2024-08-11 20:58:43,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1283980.0, ans=0.125 2024-08-11 20:58:56,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.629e+01 2.973e+01 3.425e+01 5.618e+01, threshold=5.946e+01, percent-clipped=0.0 2024-08-11 20:59:02,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2024-08-11 20:59:04,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1284080.0, ans=0.025 2024-08-11 20:59:27,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1284180.0, ans=0.1 2024-08-11 20:59:28,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1284180.0, ans=0.09899494936611666 2024-08-11 20:59:36,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1284280.0, ans=0.0 2024-08-11 20:59:42,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1284280.0, ans=0.125 2024-08-11 20:59:42,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-11 20:59:48,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12500, loss[loss=0.1075, beats_loss=0.01084, ecapa_loss=0.0001648, whisper_loss=0.09499, over 18180.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01109, ecapa_loss=0.0001913, whisper_loss=0.09385, over 3890054.54 frames. ], batch size: 72, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:59:50,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-08-11 21:00:00,321 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 21:00:10,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2024-08-11 21:00:16,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1284480.0, ans=0.125 2024-08-11 21:00:45,983 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:00:47,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2024-08-11 21:01:14,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12550, loss[loss=0.1301, beats_loss=0.01048, ecapa_loss=0.0001842, whisper_loss=0.1177, over 15239.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01116, ecapa_loss=0.000191, whisper_loss=0.09399, over 3893212.04 frames. ], batch size: 58, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:01:30,400 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 21:01:44,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.709e+01 3.086e+01 3.503e+01 6.566e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 21:01:49,047 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 21:02:25,575 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 21:02:29,160 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 21:02:30,830 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 21:02:32,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1285280.0, ans=0.95 2024-08-11 21:02:35,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12600, loss[loss=0.1202, beats_loss=0.01132, ecapa_loss=0.0001695, whisper_loss=0.1072, over 22602.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01116, ecapa_loss=0.0001918, whisper_loss=0.09433, over 3903616.13 frames. ], batch size: 91, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:03:01,863 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 21:03:10,270 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 21:03:12,400 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 21:03:16,904 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 21:03:27,947 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 21:03:31,198 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 21:03:31,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2024-08-11 21:03:32,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1285680.0, ans=0.0 2024-08-11 21:03:34,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=22.5 2024-08-11 21:03:56,217 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 21:03:57,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12650, loss[loss=0.1144, beats_loss=0.01149, ecapa_loss=0.0001907, whisper_loss=0.101, over 20422.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01119, ecapa_loss=0.0001921, whisper_loss=0.09388, over 3887496.58 frames. ], batch size: 79, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:04:01,209 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 15 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 21:04:31,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.568e+01 2.843e+01 3.370e+01 6.340e+01, threshold=5.685e+01, percent-clipped=1.0 2024-08-11 21:04:39,985 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 21:04:47,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1286080.0, ans=0.125 2024-08-11 21:04:57,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-08-11 21:05:00,644 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 21:05:04,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=8.0 2024-08-11 21:05:05,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1286180.0, ans=0.2 2024-08-11 21:05:25,572 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12700, loss[loss=0.06909, beats_loss=0.01372, ecapa_loss=0.0001612, whisper_loss=0.05376, over 15214.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01132, ecapa_loss=0.0001916, whisper_loss=0.09288, over 3893292.02 frames. ], batch size: 60, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:05:34,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1286380.0, ans=0.125 2024-08-11 21:05:38,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1286380.0, ans=0.0 2024-08-11 21:05:49,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1286480.0, ans=0.2 2024-08-11 21:05:58,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1286580.0, ans=0.04949747468305833 2024-08-11 21:06:03,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1286580.0, ans=0.125 2024-08-11 21:06:22,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-11 21:06:22,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=12.0 2024-08-11 21:06:47,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12750, loss[loss=0.1177, beats_loss=0.01075, ecapa_loss=0.0001978, whisper_loss=0.105, over 22922.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01125, ecapa_loss=0.000192, whisper_loss=0.093, over 3902024.42 frames. ], batch size: 93, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:06:49,976 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 26 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-11 21:07:07,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1286980.0, ans=0.125 2024-08-11 21:07:10,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1286980.0, ans=0.125 2024-08-11 21:07:19,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.648e+01 3.001e+01 3.436e+01 1.023e+02, threshold=6.002e+01, percent-clipped=1.0 2024-08-11 21:07:33,835 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 21:07:57,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1287180.0, ans=0.125 2024-08-11 21:08:00,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1287280.0, ans=0.125 2024-08-11 21:08:15,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1287380.0, ans=0.1 2024-08-11 21:08:16,008 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12800, loss[loss=0.08922, beats_loss=0.01481, ecapa_loss=0.0001522, whisper_loss=0.07289, over 18063.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01133, ecapa_loss=0.0001918, whisper_loss=0.09255, over 3933605.21 frames. ], batch size: 72, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:08:26,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1287380.0, ans=0.95 2024-08-11 21:08:34,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-11 21:08:44,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1287480.0, ans=0.0 2024-08-11 21:09:30,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1287780.0, ans=0.0 2024-08-11 21:09:36,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12850, loss[loss=0.1172, beats_loss=0.008595, ecapa_loss=0.000229, whisper_loss=0.1063, over 21694.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01133, ecapa_loss=0.0001919, whisper_loss=0.09139, over 3892801.03 frames. ], batch size: 87, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:09:53,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1287980.0, ans=0.125 2024-08-11 21:09:56,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1287980.0, ans=0.125 2024-08-11 21:10:04,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1287980.0, ans=6.0 2024-08-11 21:10:07,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1287980.0, ans=0.125 2024-08-11 21:10:09,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.541e+01 2.885e+01 3.297e+01 4.788e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-11 21:10:20,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2024-08-11 21:10:36,079 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 21:10:43,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1288280.0, ans=0.05 2024-08-11 21:10:47,835 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 21:11:00,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12900, loss[loss=0.08258, beats_loss=0.01015, ecapa_loss=0.0001886, whisper_loss=0.07055, over 16068.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01125, ecapa_loss=0.0001929, whisper_loss=0.0915, over 3882993.07 frames. ], batch size: 63, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:11:06,246 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 21:11:12,099 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 21:11:14,976 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 21:11:27,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1288480.0, ans=0.025 2024-08-11 21:11:32,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1288580.0, ans=0.2 2024-08-11 21:11:34,176 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 21:11:34,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1288580.0, ans=0.0 2024-08-11 21:11:48,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1288680.0, ans=0.125 2024-08-11 21:11:55,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1288680.0, ans=0.125 2024-08-11 21:12:16,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1288780.0, ans=0.0 2024-08-11 21:12:21,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 12950, loss[loss=0.08385, beats_loss=0.01288, ecapa_loss=0.0001595, whisper_loss=0.06938, over 20768.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001928, whisper_loss=0.09202, over 3872757.00 frames. ], batch size: 86, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:12:26,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1288880.0, ans=0.2 2024-08-11 21:12:36,326 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 21:12:45,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1288980.0, ans=0.0 2024-08-11 21:12:46,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1288980.0, ans=0.04949747468305833 2024-08-11 21:12:54,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.715e+01 3.125e+01 3.606e+01 5.827e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 21:12:55,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1289080.0, ans=15.0 2024-08-11 21:13:02,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1289080.0, ans=0.1 2024-08-11 21:13:09,570 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 21:13:20,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1289180.0, ans=0.125 2024-08-11 21:13:24,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.14 vs. limit=10.0 2024-08-11 21:13:34,709 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 21:13:41,099 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 21:13:45,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1289380.0, ans=0.125 2024-08-11 21:13:45,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13000, loss[loss=0.1268, beats_loss=0.008167, ecapa_loss=0.0002299, whisper_loss=0.1164, over 20737.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01115, ecapa_loss=0.0001918, whisper_loss=0.09243, over 3884476.94 frames. ], batch size: 83, lr: 6.87e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:13:50,313 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:13:51,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1289380.0, ans=10.0 2024-08-11 21:13:59,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1289380.0, ans=0.125 2024-08-11 21:14:22,818 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 21:14:27,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1289580.0, ans=0.125 2024-08-11 21:14:28,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1289580.0, ans=0.125 2024-08-11 21:14:59,545 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.696e+00 2024-08-11 21:15:01,070 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 21:15:05,736 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 21:15:12,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13050, loss[loss=0.1106, beats_loss=0.01157, ecapa_loss=0.0001732, whisper_loss=0.09733, over 18677.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01107, ecapa_loss=0.0001918, whisper_loss=0.09325, over 3893203.21 frames. ], batch size: 74, lr: 6.86e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:15:20,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1289880.0, ans=0.0 2024-08-11 21:15:30,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2024-08-11 21:15:36,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1289980.0, ans=0.2 2024-08-11 21:15:43,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.491e+01 2.763e+01 3.152e+01 5.442e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-11 21:15:49,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1290080.0, ans=0.125 2024-08-11 21:15:52,659 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 21:16:05,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1290180.0, ans=0.125 2024-08-11 21:16:14,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2024-08-11 21:16:16,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1290280.0, ans=0.1 2024-08-11 21:16:28,196 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 21:16:34,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13100, loss[loss=0.1159, beats_loss=0.0107, ecapa_loss=0.000164, whisper_loss=0.1036, over 19858.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01102, ecapa_loss=0.0001913, whisper_loss=0.09379, over 3874050.34 frames. ], batch size: 75, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:16:37,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=12.0 2024-08-11 21:16:41,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=15.0 2024-08-11 21:16:52,197 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 21:17:28,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1290680.0, ans=0.1 2024-08-11 21:17:36,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.78 vs. limit=15.0 2024-08-11 21:17:50,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2024-08-11 21:17:58,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1290780.0, ans=0.125 2024-08-11 21:17:59,612 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 21:17:59,867 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.737e+00 2024-08-11 21:18:02,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13150, loss[loss=0.09115, beats_loss=0.01229, ecapa_loss=0.0001858, whisper_loss=0.077, over 22875.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01106, ecapa_loss=0.0001916, whisper_loss=0.09333, over 3884378.64 frames. ], batch size: 94, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:18:06,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1290880.0, ans=0.125 2024-08-11 21:18:16,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1290880.0, ans=0.125 2024-08-11 21:18:23,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1290980.0, ans=0.2 2024-08-11 21:18:29,030 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 21:18:29,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1290980.0, ans=0.0 2024-08-11 21:18:36,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.521e+01 2.887e+01 3.350e+01 6.017e+01, threshold=5.775e+01, percent-clipped=1.0 2024-08-11 21:18:36,425 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 21:18:38,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1291080.0, ans=0.1 2024-08-11 21:18:46,965 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 21:19:05,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1291180.0, ans=0.1 2024-08-11 21:19:13,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1291280.0, ans=0.125 2024-08-11 21:19:15,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2024-08-11 21:19:20,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1291280.0, ans=0.0 2024-08-11 21:19:24,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13200, loss[loss=0.1164, beats_loss=0.008869, ecapa_loss=0.0001982, whisper_loss=0.1055, over 20409.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001911, whisper_loss=0.09301, over 3860694.03 frames. ], batch size: 79, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:19:36,348 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 21:19:41,725 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 21:20:04,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1291580.0, ans=0.0 2024-08-11 21:20:42,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1291780.0, ans=0.125 2024-08-11 21:20:48,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13250, loss[loss=0.07484, beats_loss=0.01341, ecapa_loss=0.0001195, whisper_loss=0.06024, over 15157.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.000191, whisper_loss=0.0921, over 3873462.64 frames. ], batch size: 57, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:20:53,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-08-11 21:21:21,280 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.554e+01 3.002e+01 3.444e+01 4.623e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 21:21:28,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1292080.0, ans=0.0 2024-08-11 21:21:33,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1292080.0, ans=0.1 2024-08-11 21:21:43,643 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 21:21:45,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1292180.0, ans=0.0 2024-08-11 21:21:50,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2024-08-11 21:21:52,319 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-11 21:21:53,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-11 21:22:05,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13300, loss[loss=0.099, beats_loss=0.01475, ecapa_loss=0.00016, whisper_loss=0.08265, over 22759.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01123, ecapa_loss=0.0001897, whisper_loss=0.09167, over 3883404.72 frames. ], batch size: 92, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:22:09,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-08-11 21:22:28,945 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 21:22:36,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1292480.0, ans=0.0 2024-08-11 21:22:56,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1292680.0, ans=0.125 2024-08-11 21:22:59,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1292680.0, ans=12.0 2024-08-11 21:23:03,874 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 21:23:05,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1292680.0, ans=0.1 2024-08-11 21:23:21,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1292780.0, ans=0.0 2024-08-11 21:23:25,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13350, loss[loss=0.1014, beats_loss=0.009093, ecapa_loss=0.0002063, whisper_loss=0.09023, over 14870.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01123, ecapa_loss=0.0001885, whisper_loss=0.09209, over 3872514.32 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:23:26,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1292880.0, ans=0.125 2024-08-11 21:23:29,683 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 27 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-11 21:23:31,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-08-11 21:23:35,628 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 21:23:37,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1292880.0, ans=0.125 2024-08-11 21:23:49,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1292980.0, ans=0.2 2024-08-11 21:23:51,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1292980.0, ans=0.125 2024-08-11 21:23:55,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.575e+01 2.972e+01 3.296e+01 7.873e+01, threshold=5.944e+01, percent-clipped=3.0 2024-08-11 21:23:58,877 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 21:23:59,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1293080.0, ans=0.125 2024-08-11 21:24:01,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1293080.0, ans=0.0 2024-08-11 21:24:16,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1293180.0, ans=0.125 2024-08-11 21:24:19,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1293180.0, ans=0.0 2024-08-11 21:24:19,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.36 vs. limit=12.0 2024-08-11 21:24:26,304 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 21:24:32,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1293280.0, ans=0.0 2024-08-11 21:24:37,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13400, loss[loss=0.09107, beats_loss=0.009452, ecapa_loss=0.0002248, whisper_loss=0.07937, over 14968.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01126, ecapa_loss=0.0001885, whisper_loss=0.0922, over 3838246.54 frames. ], batch size: 61, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:24:46,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1293380.0, ans=0.125 2024-08-11 21:24:47,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1293380.0, ans=0.0 2024-08-11 21:25:03,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1293580.0, ans=0.0 2024-08-11 21:25:11,719 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-11 21:25:38,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1293780.0, ans=0.125 2024-08-11 21:25:46,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13450, loss[loss=0.1308, beats_loss=0.008624, ecapa_loss=0.0001438, whisper_loss=0.1207, over 17568.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0113, ecapa_loss=0.000188, whisper_loss=0.09215, over 3854050.56 frames. ], batch size: 65, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:25:51,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1293880.0, ans=0.125 2024-08-11 21:26:04,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1293980.0, ans=0.1 2024-08-11 21:26:11,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1293980.0, ans=0.1 2024-08-11 21:26:12,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2024-08-11 21:26:15,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.605e+01 2.918e+01 3.272e+01 4.452e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 21:26:18,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1294080.0, ans=0.0 2024-08-11 21:26:28,913 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 21:26:37,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1294180.0, ans=0.125 2024-08-11 21:26:41,276 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 21:26:53,974 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-11 21:26:55,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13500, loss[loss=0.1071, beats_loss=0.009061, ecapa_loss=0.0002477, whisper_loss=0.09556, over 16875.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01131, ecapa_loss=0.0001873, whisper_loss=0.0922, over 3862941.90 frames. ], batch size: 72, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:27:35,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1294680.0, ans=0.0 2024-08-11 21:27:40,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1294680.0, ans=0.0 2024-08-11 21:27:48,898 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 21:28:01,150 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.739e-02 2024-08-11 21:28:03,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13550, loss[loss=0.0964, beats_loss=0.01161, ecapa_loss=0.0001943, whisper_loss=0.08285, over 21823.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01132, ecapa_loss=0.0001879, whisper_loss=0.09195, over 3860470.25 frames. ], batch size: 89, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:28:28,696 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 21:28:32,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.573e+01 2.919e+01 3.325e+01 1.633e+02, threshold=5.839e+01, percent-clipped=1.0 2024-08-11 21:28:40,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1295080.0, ans=0.1 2024-08-11 21:28:42,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1295080.0, ans=0.125 2024-08-11 21:29:06,580 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 21:29:12,019 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13600, loss[loss=0.1167, beats_loss=0.01138, ecapa_loss=0.0001818, whisper_loss=0.1035, over 22909.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01137, ecapa_loss=0.0001878, whisper_loss=0.09177, over 3877936.10 frames. ], batch size: 92, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:29:18,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1295380.0, ans=0.125 2024-08-11 21:29:25,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1295480.0, ans=0.0 2024-08-11 21:29:39,279 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 21:29:55,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1295680.0, ans=0.0 2024-08-11 21:30:00,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1295680.0, ans=0.0 2024-08-11 21:30:02,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2024-08-11 21:30:08,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-11 21:30:11,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1295780.0, ans=0.015 2024-08-11 21:30:12,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1295780.0, ans=0.125 2024-08-11 21:30:20,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13650, loss[loss=0.08926, beats_loss=0.01397, ecapa_loss=0.0001477, whisper_loss=0.07381, over 16673.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01147, ecapa_loss=0.0001884, whisper_loss=0.09121, over 3872133.51 frames. ], batch size: 66, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:30:24,628 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 21:30:26,033 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 21:30:43,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1295980.0, ans=0.05 2024-08-11 21:30:48,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.503e+01 2.904e+01 3.318e+01 5.006e+01, threshold=5.809e+01, percent-clipped=0.0 2024-08-11 21:30:49,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1296080.0, ans=0.125 2024-08-11 21:30:50,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-11 21:31:07,550 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.233e+00 2024-08-11 21:31:08,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1296180.0, ans=0.1 2024-08-11 21:31:16,765 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-11 21:31:28,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13700, loss[loss=0.1105, beats_loss=0.01013, ecapa_loss=0.0002255, whisper_loss=0.0981, over 22256.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01136, ecapa_loss=0.0001877, whisper_loss=0.09179, over 3854779.32 frames. ], batch size: 91, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:31:48,657 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 21:31:57,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1296580.0, ans=0.2 2024-08-11 21:32:00,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1296580.0, ans=0.0 2024-08-11 21:32:11,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1296680.0, ans=0.0 2024-08-11 21:32:18,273 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 21:32:33,175 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-11 21:32:38,429 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13750, loss[loss=0.1246, beats_loss=0.01073, ecapa_loss=0.0002277, whisper_loss=0.1116, over 19308.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01128, ecapa_loss=0.0001888, whisper_loss=0.09321, over 3854838.84 frames. ], batch size: 76, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:32:40,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1296880.0, ans=0.0 2024-08-11 21:32:47,156 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 21:32:50,133 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 21:32:51,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1296980.0, ans=10.0 2024-08-11 21:32:55,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1296980.0, ans=0.1 2024-08-11 21:32:55,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=12.0 2024-08-11 21:33:07,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.561e+01 2.855e+01 3.257e+01 5.078e+01, threshold=5.711e+01, percent-clipped=0.0 2024-08-11 21:33:15,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1297080.0, ans=0.1 2024-08-11 21:33:30,843 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 21:33:41,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-08-11 21:33:48,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13800, loss[loss=0.09587, beats_loss=0.01132, ecapa_loss=0.0001637, whisper_loss=0.08291, over 13693.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01123, ecapa_loss=0.00019, whisper_loss=0.09243, over 3820354.38 frames. ], batch size: 53, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:34:05,189 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 21:34:23,626 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 38 from Vox, 32 fro AS 2024-08-11 21:34:25,075 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 21:34:56,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1297880.0, ans=0.125 2024-08-11 21:34:57,483 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13850, loss[loss=0.1166, beats_loss=0.01014, ecapa_loss=0.0001786, whisper_loss=0.1047, over 23569.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0112, ecapa_loss=0.0001907, whisper_loss=0.09266, over 3873324.25 frames. ], batch size: 89, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:35:12,659 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 21:35:14,034 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 21:35:18,317 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 21:35:19,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=22.5 2024-08-11 21:35:19,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1297980.0, ans=0.0 2024-08-11 21:35:20,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=12.0 2024-08-11 21:35:25,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1298080.0, ans=0.1 2024-08-11 21:35:26,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.750e+01 3.088e+01 3.546e+01 6.102e+01, threshold=6.176e+01, percent-clipped=2.0 2024-08-11 21:35:39,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1298180.0, ans=0.125 2024-08-11 21:35:45,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1298180.0, ans=0.125 2024-08-11 21:35:55,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.37 vs. limit=10.0 2024-08-11 21:36:05,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13900, loss[loss=0.1057, beats_loss=0.01204, ecapa_loss=0.0002091, whisper_loss=0.09153, over 22434.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01121, ecapa_loss=0.0001901, whisper_loss=0.09249, over 3872399.89 frames. ], batch size: 94, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:36:06,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-11 21:36:18,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1298480.0, ans=0.125 2024-08-11 21:36:34,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1298580.0, ans=0.07 2024-08-11 21:36:42,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1298580.0, ans=0.0 2024-08-11 21:36:47,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1298680.0, ans=0.07 2024-08-11 21:36:48,877 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 21:36:50,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1298680.0, ans=0.0 2024-08-11 21:37:00,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2024-08-11 21:37:08,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1298780.0, ans=0.0 2024-08-11 21:37:09,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=15.0 2024-08-11 21:37:14,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 13950, loss[loss=0.118, beats_loss=0.01082, ecapa_loss=0.0001714, whisper_loss=0.1054, over 22296.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01114, ecapa_loss=0.0001898, whisper_loss=0.09257, over 3842050.34 frames. ], batch size: 90, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:37:24,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=15.0 2024-08-11 21:37:26,000 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 21:37:43,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.710e+01 3.048e+01 3.326e+01 4.854e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-11 21:37:47,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1299080.0, ans=0.125 2024-08-11 21:37:54,078 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 21:38:23,833 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14000, loss[loss=0.1184, beats_loss=0.01112, ecapa_loss=0.0001813, whisper_loss=0.1055, over 22673.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01109, ecapa_loss=0.0001887, whisper_loss=0.09274, over 3863117.39 frames. ], batch size: 89, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:38:43,059 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 21:39:05,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1299680.0, ans=0.0 2024-08-11 21:39:05,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-11 21:39:14,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2024-08-11 21:39:18,747 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 21:39:26,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1299780.0, ans=0.0 2024-08-11 21:39:32,780 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14050, loss[loss=0.1441, beats_loss=0.01045, ecapa_loss=0.0001856, whisper_loss=0.1318, over 23574.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01108, ecapa_loss=0.0001874, whisper_loss=0.09272, over 3858209.00 frames. ], batch size: 90, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:39:38,451 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 21:39:53,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1299980.0, ans=0.2 2024-08-11 21:39:59,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1300080.0, ans=0.125 2024-08-11 21:40:01,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.630e+01 2.929e+01 3.311e+01 9.104e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 21:40:04,571 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 21:40:13,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1300180.0, ans=0.125 2024-08-11 21:40:15,664 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 21:40:28,930 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 21:40:41,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14100, loss[loss=0.1158, beats_loss=0.01029, ecapa_loss=0.0001734, whisper_loss=0.1037, over 18757.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0111, ecapa_loss=0.0001881, whisper_loss=0.09329, over 3876068.23 frames. ], batch size: 71, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:40:56,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1300480.0, ans=0.125 2024-08-11 21:41:09,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1300580.0, ans=0.0 2024-08-11 21:41:19,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1300580.0, ans=0.0 2024-08-11 21:41:25,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=15.0 2024-08-11 21:41:27,523 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 21:41:37,109 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 21:41:50,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14150, loss[loss=0.1069, beats_loss=0.009242, ecapa_loss=0.0001802, whisper_loss=0.09581, over 16204.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01111, ecapa_loss=0.0001866, whisper_loss=0.09373, over 3874953.95 frames. ], batch size: 56, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:41:51,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-08-11 21:42:02,071 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 21:42:10,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1300980.0, ans=0.2 2024-08-11 21:42:16,908 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 21:42:19,643 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.615e+01 2.850e+01 3.033e+01 5.082e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 21:42:19,938 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 21:42:22,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1301080.0, ans=0.1 2024-08-11 21:42:29,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1301080.0, ans=0.2 2024-08-11 21:42:56,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1301280.0, ans=0.125 2024-08-11 21:42:58,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1301380.0, ans=0.125 2024-08-11 21:42:59,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14200, loss[loss=0.1174, beats_loss=0.008985, ecapa_loss=0.0002145, whisper_loss=0.1062, over 22473.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01107, ecapa_loss=0.000187, whisper_loss=0.09458, over 3916952.98 frames. ], batch size: 89, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:43:15,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1301480.0, ans=0.95 2024-08-11 21:43:28,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1301580.0, ans=0.1 2024-08-11 21:43:28,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-11 21:43:29,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-08-11 21:43:31,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1301580.0, ans=0.125 2024-08-11 21:43:38,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1301580.0, ans=0.125 2024-08-11 21:43:45,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1301680.0, ans=0.0 2024-08-11 21:43:57,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1301780.0, ans=0.0 2024-08-11 21:44:08,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14250, loss[loss=0.1069, beats_loss=0.005702, ecapa_loss=0.0002036, whisper_loss=0.09918, over 14902.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01103, ecapa_loss=0.0001874, whisper_loss=0.0944, over 3924933.42 frames. ], batch size: 55, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:44:17,945 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-11 21:44:37,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.747e+01 3.033e+01 3.629e+01 5.919e+01, threshold=6.067e+01, percent-clipped=2.0 2024-08-11 21:44:38,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2024-08-11 21:44:44,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1302080.0, ans=0.2 2024-08-11 21:44:52,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1302180.0, ans=0.125 2024-08-11 21:44:52,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-08-11 21:44:53,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1302180.0, ans=0.0 2024-08-11 21:44:59,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1302180.0, ans=0.125 2024-08-11 21:45:11,239 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 26 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 21:45:17,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14300, loss[loss=0.1077, beats_loss=0.009977, ecapa_loss=0.0001682, whisper_loss=0.096, over 17434.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01108, ecapa_loss=0.0001869, whisper_loss=0.09382, over 3916531.27 frames. ], batch size: 65, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:45:19,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-11 21:46:06,698 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 21:46:16,256 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 21:46:20,564 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 8 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 21:46:27,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14350, loss[loss=0.1187, beats_loss=0.008951, ecapa_loss=0.000173, whisper_loss=0.108, over 15843.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01109, ecapa_loss=0.0001856, whisper_loss=0.0937, over 3923679.39 frames. ], batch size: 61, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:46:27,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1302880.0, ans=0.125 2024-08-11 21:46:41,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1302980.0, ans=0.2 2024-08-11 21:46:46,757 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 21:46:56,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.719e+01 2.981e+01 3.464e+01 5.321e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 21:46:58,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1303080.0, ans=0.125 2024-08-11 21:47:02,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1303080.0, ans=0.0 2024-08-11 21:47:09,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1303180.0, ans=0.1 2024-08-11 21:47:13,532 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 21:47:21,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1303280.0, ans=0.125 2024-08-11 21:47:31,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.67 vs. limit=10.0 2024-08-11 21:47:34,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1303280.0, ans=0.1 2024-08-11 21:47:37,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14400, loss[loss=0.1165, beats_loss=0.01052, ecapa_loss=0.0001889, whisper_loss=0.104, over 22722.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01118, ecapa_loss=0.0001864, whisper_loss=0.09342, over 3947301.57 frames. ], batch size: 89, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:47:45,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-08-11 21:47:48,494 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 21:47:51,316 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 21:48:00,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1303480.0, ans=0.0 2024-08-11 21:48:09,481 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 9 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 21:48:18,966 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 21:48:45,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 9, batch 14450, loss[loss=0.08295, beats_loss=0.01159, ecapa_loss=0.0001765, whisper_loss=0.06959, over 19990.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01129, ecapa_loss=0.0001869, whisper_loss=0.0921, over 3957156.06 frames. ], batch size: 82, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:48:47,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1303880.0, ans=0.125 2024-08-11 21:48:48,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1303880.0, ans=0.2 2024-08-11 21:48:54,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1303880.0, ans=0.1 2024-08-11 21:49:05,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1303980.0, ans=0.2 2024-08-11 21:49:13,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.624e+01 2.900e+01 3.333e+01 5.803e+01, threshold=5.799e+01, percent-clipped=0.0 2024-08-11 21:50:30,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 0, loss[loss=0.1164, beats_loss=0.01116, ecapa_loss=0.0001828, whisper_loss=0.1034, over 19945.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01116, ecapa_loss=0.0001828, whisper_loss=0.1034, over 19945.00 frames. ], batch size: 77, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:50:30,434 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 21:51:12,059 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([8.0901e-04, 5.0043e-02, 1.8018e-03, 3.1572e+00, 2.7984e-03, 3.3498e-02, 7.7152e-02, 3.0879e-02], device='cuda:3') 2024-08-11 21:51:13,137 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on ASR_libri: loss=0.2568, beats_loss=0, ecapa_loss=0.0006206, whisper_loss=0.2506, over 922467.00 frames. 2024-08-11 21:51:29,338 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on SV_voxceleb1: loss=0.005051, beats_loss=0, ecapa_loss=0.0005051, whisper_loss=0, over 939242.00 frames. 2024-08-11 21:53:07,083 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0264, 2.6820, 2.4922, 2.6020], device='cuda:3') 2024-08-11 21:53:33,441 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on AT_audioset: loss=0.02495, beats_loss=0.02495, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 21:53:33,444 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 21:53:40,697 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-11 21:54:05,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1304420.0, ans=0.1 2024-08-11 21:54:11,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1304420.0, ans=0.09899494936611666 2024-08-11 21:54:16,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1304420.0, ans=0.0 2024-08-11 21:55:17,002 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 21:55:28,837 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 21:55:43,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 50, loss[loss=0.09954, beats_loss=0.01041, ecapa_loss=0.0001586, whisper_loss=0.08755, over 20277.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01077, ecapa_loss=0.0001911, whisper_loss=0.09232, over 897562.94 frames. ], batch size: 78, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:55:56,110 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 21:55:58,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1304820.0, ans=0.1 2024-08-11 21:56:39,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1305020.0, ans=0.125 2024-08-11 21:56:47,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.905e+01 3.307e+01 3.702e+01 5.786e+01, threshold=6.614e+01, percent-clipped=0.0 2024-08-11 21:56:58,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1305120.0, ans=0.125 2024-08-11 21:57:05,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1305120.0, ans=0.125 2024-08-11 21:57:38,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1305220.0, ans=0.125 2024-08-11 21:57:41,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 100, loss[loss=0.09361, beats_loss=0.01073, ecapa_loss=0.000189, whisper_loss=0.08099, over 20313.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001922, whisper_loss=0.09081, over 1533359.34 frames. ], batch size: 81, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:57:57,065 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 21:58:01,282 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 21:58:03,773 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 21:58:06,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2024-08-11 21:58:13,373 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 21:58:31,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1305520.0, ans=0.2 2024-08-11 21:58:39,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1305520.0, ans=0.125 2024-08-11 21:59:24,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1305720.0, ans=0.125 2024-08-11 21:59:24,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2024-08-11 21:59:32,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 150, loss[loss=0.1125, beats_loss=0.01084, ecapa_loss=0.0001884, whisper_loss=0.09981, over 16171.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001891, whisper_loss=0.09094, over 2043344.74 frames. ], batch size: 62, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:59:33,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1305820.0, ans=0.0 2024-08-11 21:59:59,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-11 22:00:20,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.798e+01 3.187e+01 3.633e+01 2.129e+02, threshold=6.375e+01, percent-clipped=1.0 2024-08-11 22:00:25,154 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 22:00:54,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1306220.0, ans=0.2 2024-08-11 22:00:58,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 200, loss[loss=0.1134, beats_loss=0.009328, ecapa_loss=0.0002533, whisper_loss=0.1016, over 21092.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0106, ecapa_loss=0.0001873, whisper_loss=0.09309, over 2439287.89 frames. ], batch size: 89, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:01:01,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-11 22:01:15,784 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-11 22:01:16,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1306420.0, ans=0.1 2024-08-11 22:01:20,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1306420.0, ans=0.0 2024-08-11 22:01:23,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1306420.0, ans=0.0 2024-08-11 22:01:30,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1306520.0, ans=0.1 2024-08-11 22:01:33,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1306520.0, ans=0.125 2024-08-11 22:01:41,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-08-11 22:01:45,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1306620.0, ans=0.125 2024-08-11 22:01:49,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1306620.0, ans=0.125 2024-08-11 22:01:53,430 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 13 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 22:01:58,010 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 14 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 22:02:02,933 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 22:02:12,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1306720.0, ans=0.125 2024-08-11 22:02:13,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1306820.0, ans=0.0 2024-08-11 22:02:14,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 250, loss[loss=0.1115, beats_loss=0.01026, ecapa_loss=0.0001753, whisper_loss=0.09952, over 22128.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01081, ecapa_loss=0.0001862, whisper_loss=0.09264, over 2749686.92 frames. ], batch size: 86, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:02:27,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1306820.0, ans=0.1 2024-08-11 22:02:33,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1306920.0, ans=0.0 2024-08-11 22:02:34,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1306920.0, ans=0.125 2024-08-11 22:02:36,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1306920.0, ans=0.125 2024-08-11 22:02:38,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.80 vs. limit=22.5 2024-08-11 22:02:41,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2024-08-11 22:02:43,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1306920.0, ans=0.2 2024-08-11 22:02:51,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1307020.0, ans=0.1 2024-08-11 22:02:57,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.454e+01 2.692e+01 3.153e+01 8.296e+01, threshold=5.384e+01, percent-clipped=2.0 2024-08-11 22:03:00,758 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 22:03:25,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-08-11 22:03:31,178 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 300, loss[loss=0.08521, beats_loss=0.0125, ecapa_loss=0.0001745, whisper_loss=0.07097, over 20015.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001872, whisper_loss=0.09186, over 2976303.56 frames. ], batch size: 78, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:03:31,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1307320.0, ans=0.2 2024-08-11 22:04:05,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1307520.0, ans=0.125 2024-08-11 22:04:09,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=12.0 2024-08-11 22:04:31,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1307720.0, ans=0.0 2024-08-11 22:04:33,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1307720.0, ans=0.0 2024-08-11 22:04:35,837 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 22:04:44,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1307720.0, ans=0.125 2024-08-11 22:04:46,377 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 350, loss[loss=0.1067, beats_loss=0.007754, ecapa_loss=0.0001856, whisper_loss=0.09707, over 14992.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01098, ecapa_loss=0.0001863, whisper_loss=0.09099, over 3139952.35 frames. ], batch size: 56, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:04:48,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1307820.0, ans=0.0 2024-08-11 22:04:51,213 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 22:05:00,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1307920.0, ans=0.05 2024-08-11 22:05:09,539 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 22:05:10,949 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-11 22:05:26,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.513e+01 2.913e+01 3.282e+01 4.748e+01, threshold=5.825e+01, percent-clipped=0.0 2024-08-11 22:05:26,283 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 22:05:44,449 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 22:05:52,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1308220.0, ans=0.0 2024-08-11 22:06:01,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 400, loss[loss=0.07385, beats_loss=0.01161, ecapa_loss=0.0002219, whisper_loss=0.06002, over 14548.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01097, ecapa_loss=0.0001849, whisper_loss=0.09098, over 3279200.41 frames. ], batch size: 60, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:06:05,935 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 22:06:06,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1308320.0, ans=0.125 2024-08-11 22:06:12,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1308320.0, ans=0.125 2024-08-11 22:06:26,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1308420.0, ans=0.2 2024-08-11 22:06:27,143 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 22:06:34,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-11 22:06:42,845 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 22:06:48,641 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 22:06:49,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1308620.0, ans=0.1 2024-08-11 22:06:51,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-11 22:06:57,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1308620.0, ans=0.125 2024-08-11 22:07:17,936 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 450, loss[loss=0.1043, beats_loss=0.01198, ecapa_loss=0.0001651, whisper_loss=0.09067, over 18313.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01102, ecapa_loss=0.0001857, whisper_loss=0.09055, over 3386319.61 frames. ], batch size: 70, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:07:52,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1309020.0, ans=0.5 2024-08-11 22:07:58,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.488e+01 3.017e+01 3.515e+01 8.522e+01, threshold=6.035e+01, percent-clipped=1.0 2024-08-11 22:08:11,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2024-08-11 22:08:19,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1309220.0, ans=0.07 2024-08-11 22:08:20,653 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-11 22:08:20,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1309220.0, ans=0.125 2024-08-11 22:08:26,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1309220.0, ans=0.125 2024-08-11 22:08:32,656 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 22:08:33,632 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 500, loss[loss=0.1119, beats_loss=0.01265, ecapa_loss=0.0001816, whisper_loss=0.09746, over 21965.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01085, ecapa_loss=0.0001855, whisper_loss=0.09268, over 3524052.61 frames. ], batch size: 88, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:08:46,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309320.0, ans=0.1 2024-08-11 22:08:50,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-11 22:09:01,869 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:09:01,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1309420.0, ans=0.09899494936611666 2024-08-11 22:09:02,811 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 22:09:04,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1309520.0, ans=0.125 2024-08-11 22:09:07,094 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 22:09:13,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309520.0, ans=0.1 2024-08-11 22:09:15,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1309520.0, ans=0.0 2024-08-11 22:09:19,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1309620.0, ans=0.2 2024-08-11 22:09:29,662 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 22:09:40,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1309720.0, ans=0.125 2024-08-11 22:09:47,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1309720.0, ans=0.0 2024-08-11 22:09:48,164 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 22:09:50,827 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 550, loss[loss=0.09202, beats_loss=0.01255, ecapa_loss=0.0001719, whisper_loss=0.07776, over 18271.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01084, ecapa_loss=0.0001856, whisper_loss=0.0926, over 3584977.40 frames. ], batch size: 74, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:10:01,680 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 22:10:11,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1309920.0, ans=0.2 2024-08-11 22:10:25,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1310020.0, ans=0.125 2024-08-11 22:10:32,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.567e+01 3.001e+01 3.540e+01 6.068e+01, threshold=6.003e+01, percent-clipped=1.0 2024-08-11 22:10:44,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-11 22:10:52,473 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 22:11:01,843 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 22:11:03,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1310220.0, ans=0.125 2024-08-11 22:11:07,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 600, loss[loss=0.1018, beats_loss=0.01138, ecapa_loss=0.0002314, whisper_loss=0.08813, over 15012.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01087, ecapa_loss=0.0001846, whisper_loss=0.09304, over 3646517.53 frames. ], batch size: 62, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:11:48,252 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 22:12:23,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 650, loss[loss=0.1214, beats_loss=0.008466, ecapa_loss=0.0001632, whisper_loss=0.1113, over 22480.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01092, ecapa_loss=0.000184, whisper_loss=0.09271, over 3693016.22 frames. ], batch size: 84, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:12:52,071 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 22:12:58,759 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-11 22:13:02,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1311020.0, ans=0.125 2024-08-11 22:13:04,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.576e+01 2.785e+01 3.016e+01 3.995e+01, threshold=5.570e+01, percent-clipped=0.0 2024-08-11 22:13:05,103 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:13:05,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1311020.0, ans=0.0 2024-08-11 22:13:08,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1311120.0, ans=0.125 2024-08-11 22:13:25,293 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 22:13:25,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1311220.0, ans=0.0 2024-08-11 22:13:28,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1311220.0, ans=0.2 2024-08-11 22:13:39,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2024-08-11 22:13:40,368 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 700, loss[loss=0.1018, beats_loss=0.01117, ecapa_loss=0.0002189, whisper_loss=0.08849, over 14902.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01098, ecapa_loss=0.0001845, whisper_loss=0.09268, over 3729311.90 frames. ], batch size: 58, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:13:47,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=12.0 2024-08-11 22:13:54,723 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 22:14:37,921 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 22:14:40,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1311620.0, ans=0.0 2024-08-11 22:14:41,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1311620.0, ans=10.0 2024-08-11 22:14:51,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1311720.0, ans=0.0 2024-08-11 22:14:58,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1311720.0, ans=0.0 2024-08-11 22:15:00,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 750, loss[loss=0.09917, beats_loss=0.01283, ecapa_loss=0.0001499, whisper_loss=0.08484, over 16140.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.0001842, whisper_loss=0.09213, over 3733961.02 frames. ], batch size: 63, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:15:30,346 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 22:15:30,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1312020.0, ans=0.0 2024-08-11 22:15:33,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1312020.0, ans=0.02 2024-08-11 22:15:41,509 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.583e+01 2.835e+01 3.303e+01 6.155e+01, threshold=5.670e+01, percent-clipped=2.0 2024-08-11 22:16:00,844 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 22:16:10,017 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 22:16:17,535 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 800, loss[loss=0.1051, beats_loss=0.009534, ecapa_loss=0.0001645, whisper_loss=0.09391, over 16426.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01113, ecapa_loss=0.0001833, whisper_loss=0.09118, over 3753079.92 frames. ], batch size: 60, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:16:28,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1312320.0, ans=0.0 2024-08-11 22:16:30,032 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 22:16:59,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1312520.0, ans=0.125 2024-08-11 22:17:08,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1312520.0, ans=0.125 2024-08-11 22:17:20,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1312620.0, ans=0.0 2024-08-11 22:17:22,446 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 22:17:29,681 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 22:17:40,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1312720.0, ans=0.0 2024-08-11 22:17:42,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1312720.0, ans=0.05 2024-08-11 22:17:42,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1312720.0, ans=0.125 2024-08-11 22:17:52,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 850, loss[loss=0.08305, beats_loss=0.01288, ecapa_loss=0.0001827, whisper_loss=0.06835, over 13221.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01111, ecapa_loss=0.0001833, whisper_loss=0.091, over 3752281.87 frames. ], batch size: 55, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:18:00,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2024-08-11 22:18:04,232 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 22:18:08,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1312920.0, ans=0.1 2024-08-11 22:18:38,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2024-08-11 22:18:41,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.539e+01 2.872e+01 3.296e+01 5.215e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 22:18:44,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1313020.0, ans=0.2 2024-08-11 22:18:55,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1313120.0, ans=0.125 2024-08-11 22:19:14,972 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 22:19:20,390 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 22:19:25,994 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 900, loss[loss=0.1083, beats_loss=0.01059, ecapa_loss=0.0001833, whisper_loss=0.09586, over 24111.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01113, ecapa_loss=0.0001813, whisper_loss=0.09082, over 3772178.12 frames. ], batch size: 95, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:20:06,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1313520.0, ans=0.05 2024-08-11 22:20:30,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-08-11 22:20:54,633 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 22:21:02,300 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 950, loss[loss=0.1101, beats_loss=0.01006, ecapa_loss=0.0001953, whisper_loss=0.09807, over 22375.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01115, ecapa_loss=0.000181, whisper_loss=0.09087, over 3799084.45 frames. ], batch size: 90, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:21:32,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-11 22:21:44,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1314020.0, ans=0.1 2024-08-11 22:21:51,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.622e+01 2.859e+01 3.329e+01 4.580e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-11 22:22:10,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1314120.0, ans=0.0 2024-08-11 22:22:16,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.39 vs. limit=15.0 2024-08-11 22:22:31,672 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 22:22:34,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1000, loss[loss=0.1097, beats_loss=0.009352, ecapa_loss=0.0001993, whisper_loss=0.09834, over 13654.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01118, ecapa_loss=0.000179, whisper_loss=0.09081, over 3768402.13 frames. ], batch size: 55, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:23:16,801 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 22:23:23,716 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:23:37,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1314620.0, ans=0.0 2024-08-11 22:23:39,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-11 22:23:44,404 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 22:24:01,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1050, loss[loss=0.09238, beats_loss=0.01427, ecapa_loss=0.0001352, whisper_loss=0.07676, over 23469.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01118, ecapa_loss=0.000178, whisper_loss=0.09097, over 3767067.76 frames. ], batch size: 93, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:24:21,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1314920.0, ans=0.0 2024-08-11 22:24:26,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1314920.0, ans=0.0 2024-08-11 22:24:26,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-11 22:24:37,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1315020.0, ans=0.125 2024-08-11 22:24:38,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.421e+01 2.684e+01 3.099e+01 9.894e+01, threshold=5.368e+01, percent-clipped=2.0 2024-08-11 22:24:38,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1315020.0, ans=0.125 2024-08-11 22:24:42,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2024-08-11 22:24:52,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1315120.0, ans=0.125 2024-08-11 22:25:09,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1100, loss[loss=0.09196, beats_loss=0.01362, ecapa_loss=0.0001527, whisper_loss=0.07681, over 16154.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01119, ecapa_loss=0.0001771, whisper_loss=0.09138, over 3794958.35 frames. ], batch size: 62, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:25:12,245 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 22:25:12,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-11 22:25:18,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1315320.0, ans=10.0 2024-08-11 22:25:20,633 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 22:25:27,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1315420.0, ans=0.1 2024-08-11 22:25:32,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1315420.0, ans=0.0 2024-08-11 22:25:32,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1315420.0, ans=0.125 2024-08-11 22:25:38,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1315520.0, ans=0.0 2024-08-11 22:25:42,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1315520.0, ans=0.125 2024-08-11 22:25:48,903 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 22:25:58,666 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 22:26:08,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1315720.0, ans=0.0 2024-08-11 22:26:17,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1150, loss[loss=0.09838, beats_loss=0.01083, ecapa_loss=0.0001749, whisper_loss=0.0858, over 15765.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001785, whisper_loss=0.09195, over 3797188.50 frames. ], batch size: 58, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:26:25,037 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 22:26:28,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-11 22:26:29,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=12.0 2024-08-11 22:26:30,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1315920.0, ans=0.125 2024-08-11 22:26:44,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1316020.0, ans=0.1 2024-08-11 22:26:54,929 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.627e+01 2.933e+01 3.282e+01 4.582e+01, threshold=5.866e+01, percent-clipped=0.0 2024-08-11 22:27:20,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1316220.0, ans=15.0 2024-08-11 22:27:26,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1200, loss[loss=0.0711, beats_loss=0.01307, ecapa_loss=0.000183, whisper_loss=0.0562, over 13312.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001788, whisper_loss=0.0925, over 3808160.67 frames. ], batch size: 56, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:27:28,643 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-11 22:27:35,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1316320.0, ans=0.0 2024-08-11 22:27:46,504 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 22:27:48,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1316420.0, ans=0.1 2024-08-11 22:27:50,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=12.0 2024-08-11 22:27:52,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1316420.0, ans=0.125 2024-08-11 22:28:02,803 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 22:28:02,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1316520.0, ans=0.125 2024-08-11 22:28:14,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1316620.0, ans=0.2 2024-08-11 22:28:19,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1316620.0, ans=0.125 2024-08-11 22:28:28,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-11 22:28:35,816 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1250, loss[loss=0.09539, beats_loss=0.01106, ecapa_loss=0.0001711, whisper_loss=0.08262, over 17000.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.000179, whisper_loss=0.09241, over 3807595.30 frames. ], batch size: 67, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:28:49,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=12.0 2024-08-11 22:29:11,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.416e+01 2.652e+01 2.971e+01 4.212e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-11 22:29:13,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.60 vs. limit=10.0 2024-08-11 22:29:24,807 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.552e+02 2024-08-11 22:29:38,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2024-08-11 22:29:39,280 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 22:29:43,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1300, loss[loss=0.1146, beats_loss=0.009647, ecapa_loss=0.0002037, whisper_loss=0.1029, over 19253.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001805, whisper_loss=0.09252, over 3808282.45 frames. ], batch size: 77, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:29:43,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1317320.0, ans=0.1 2024-08-11 22:30:04,225 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 22:30:15,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1317520.0, ans=0.0 2024-08-11 22:30:26,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1317620.0, ans=0.0 2024-08-11 22:30:30,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1317620.0, ans=0.2 2024-08-11 22:30:41,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=22.5 2024-08-11 22:30:51,448 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1350, loss[loss=0.08094, beats_loss=0.01238, ecapa_loss=0.000147, whisper_loss=0.0671, over 17797.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001793, whisper_loss=0.09198, over 3811922.10 frames. ], batch size: 69, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:30:54,930 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 22:31:21,266 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 22:31:26,448 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 22:31:28,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.605e+01 2.891e+01 3.294e+01 5.251e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-11 22:31:44,559 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 22:31:52,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1318220.0, ans=0.0 2024-08-11 22:31:54,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1318220.0, ans=0.2 2024-08-11 22:31:59,214 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 22:32:00,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1400, loss[loss=0.09061, beats_loss=0.01191, ecapa_loss=0.000183, whisper_loss=0.07687, over 22023.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01094, ecapa_loss=0.0001816, whisper_loss=0.092, over 3828209.57 frames. ], batch size: 92, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:32:07,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=15.0 2024-08-11 22:32:10,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1318320.0, ans=0.035 2024-08-11 22:32:12,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2024-08-11 22:32:23,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318420.0, ans=0.1 2024-08-11 22:32:47,981 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:32:48,934 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 22:33:11,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1450, loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0002018, whisper_loss=0.08916, over 18923.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.000182, whisper_loss=0.09147, over 3814647.19 frames. ], batch size: 80, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:33:39,376 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.775e-01 2024-08-11 22:34:12,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.426e+01 2.679e+01 3.124e+01 8.618e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-11 22:34:13,226 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 22:34:32,841 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 22:34:45,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1500, loss[loss=0.1055, beats_loss=0.01263, ecapa_loss=0.0002207, whisper_loss=0.09067, over 20872.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01104, ecapa_loss=0.0001804, whisper_loss=0.09095, over 3814103.01 frames. ], batch size: 90, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:34:50,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=15.0 2024-08-11 22:34:57,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-08-11 22:35:01,848 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.923e-01 2024-08-11 22:35:05,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1319420.0, ans=0.09899494936611666 2024-08-11 22:35:22,881 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 22:35:36,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.0 2024-08-11 22:35:40,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1319620.0, ans=0.0 2024-08-11 22:35:52,772 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 29 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 22:35:57,881 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1550, loss[loss=0.08284, beats_loss=0.01166, ecapa_loss=0.0001817, whisper_loss=0.06936, over 19199.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01107, ecapa_loss=0.0001786, whisper_loss=0.09147, over 3822940.62 frames. ], batch size: 75, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:35:58,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1319820.0, ans=0.5 2024-08-11 22:36:12,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1319920.0, ans=0.125 2024-08-11 22:36:14,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2024-08-11 22:36:27,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1320020.0, ans=0.125 2024-08-11 22:36:34,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1320020.0, ans=0.1 2024-08-11 22:36:37,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.581e+01 2.864e+01 3.252e+01 1.978e+02, threshold=5.728e+01, percent-clipped=3.0 2024-08-11 22:36:40,896 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 22:37:05,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1320220.0, ans=0.125 2024-08-11 22:37:09,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1600, loss[loss=0.1128, beats_loss=0.01123, ecapa_loss=0.000183, whisper_loss=0.09976, over 18389.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01103, ecapa_loss=0.0001783, whisper_loss=0.0912, over 3818580.32 frames. ], batch size: 72, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:37:34,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1320420.0, ans=0.125 2024-08-11 22:37:35,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2024-08-11 22:37:37,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1320520.0, ans=0.5 2024-08-11 22:37:39,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1320520.0, ans=0.125 2024-08-11 22:37:41,608 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 22:37:48,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1320520.0, ans=0.125 2024-08-11 22:37:49,336 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 22:37:54,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.65 vs. limit=5.0 2024-08-11 22:37:57,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1320620.0, ans=0.05 2024-08-11 22:38:05,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1320720.0, ans=0.1 2024-08-11 22:38:08,459 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 22:38:16,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1320820.0, ans=0.0 2024-08-11 22:38:17,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1650, loss[loss=0.1088, beats_loss=0.01112, ecapa_loss=0.000168, whisper_loss=0.09602, over 19840.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01104, ecapa_loss=0.0001785, whisper_loss=0.09167, over 3829419.98 frames. ], batch size: 75, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:38:43,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1320920.0, ans=0.0 2024-08-11 22:38:44,213 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 22:38:55,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.537e+01 2.816e+01 3.147e+01 5.584e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-11 22:39:00,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1321120.0, ans=0.0 2024-08-11 22:39:27,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1700, loss[loss=0.1133, beats_loss=0.01214, ecapa_loss=0.0001576, whisper_loss=0.09955, over 23807.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01089, ecapa_loss=0.0001801, whisper_loss=0.09266, over 3838869.77 frames. ], batch size: 91, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:39:42,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1321420.0, ans=0.0 2024-08-11 22:39:46,533 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 22:39:47,891 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 17 from LS+wenet, 31 from Vox, 44 fro AS 2024-08-11 22:39:54,621 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 22:40:06,880 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 22:40:21,496 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 22:40:34,616 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-11 22:40:35,624 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1750, loss[loss=0.1075, beats_loss=0.01009, ecapa_loss=0.0001171, whisper_loss=0.09619, over 16036.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001796, whisper_loss=0.09246, over 3841662.58 frames. ], batch size: 57, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:40:38,474 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 22:40:53,525 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 22:41:11,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2024-08-11 22:41:12,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.558e+01 2.941e+01 3.375e+01 5.382e+01, threshold=5.883e+01, percent-clipped=0.0 2024-08-11 22:41:16,234 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 22:41:44,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2024-08-11 22:41:45,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1800, loss[loss=0.1325, beats_loss=0.007683, ecapa_loss=0.0002038, whisper_loss=0.1228, over 23448.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01107, ecapa_loss=0.0001794, whisper_loss=0.09155, over 3813452.72 frames. ], batch size: 93, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:41:48,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1322320.0, ans=0.0 2024-08-11 22:41:58,349 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:42:22,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2024-08-11 22:42:25,990 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 22:42:28,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1322620.0, ans=0.0 2024-08-11 22:42:43,719 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 33 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 22:42:58,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1850, loss[loss=0.09638, beats_loss=0.01036, ecapa_loss=0.0001819, whisper_loss=0.0842, over 14542.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.00018, whisper_loss=0.09179, over 3827011.71 frames. ], batch size: 56, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:43:21,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1322920.0, ans=0.1 2024-08-11 22:43:34,789 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:43:34,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1323020.0, ans=0.0 2024-08-11 22:43:39,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.593e+01 2.917e+01 3.347e+01 7.328e+01, threshold=5.834e+01, percent-clipped=1.0 2024-08-11 22:43:40,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1323020.0, ans=0.125 2024-08-11 22:43:42,830 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 22:43:44,235 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 22:43:44,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1323120.0, ans=0.1 2024-08-11 22:44:00,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1323220.0, ans=0.0 2024-08-11 22:44:08,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1323220.0, ans=0.1 2024-08-11 22:44:12,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1900, loss[loss=0.07748, beats_loss=0.01199, ecapa_loss=0.0001906, whisper_loss=0.06358, over 14335.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01101, ecapa_loss=0.0001805, whisper_loss=0.09159, over 3815829.65 frames. ], batch size: 56, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:44:12,634 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 33 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 22:44:22,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2024-08-11 22:44:39,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1323420.0, ans=0.125 2024-08-11 22:44:39,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1323420.0, ans=15.0 2024-08-11 22:44:40,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1323520.0, ans=0.125 2024-08-11 22:44:44,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1323520.0, ans=0.0 2024-08-11 22:44:50,100 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 22:44:58,947 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 22:45:01,768 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 22:45:07,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1323620.0, ans=0.0 2024-08-11 22:45:11,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1323720.0, ans=0.0 2024-08-11 22:45:24,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 1950, loss[loss=0.08717, beats_loss=0.0117, ecapa_loss=0.000182, whisper_loss=0.07364, over 16745.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01105, ecapa_loss=0.0001826, whisper_loss=0.09101, over 3802670.69 frames. ], batch size: 68, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:45:27,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1323820.0, ans=0.125 2024-08-11 22:45:37,221 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.493e-03 2024-08-11 22:45:39,455 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-11 22:45:53,206 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:46:02,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.530e+01 2.923e+01 3.581e+01 1.963e+02, threshold=5.846e+01, percent-clipped=3.0 2024-08-11 22:46:09,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1324120.0, ans=0.1 2024-08-11 22:46:16,346 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:46:19,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=12.0 2024-08-11 22:46:34,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1324220.0, ans=0.125 2024-08-11 22:46:35,511 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 22:46:35,780 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:46:36,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2000, loss[loss=0.09795, beats_loss=0.01134, ecapa_loss=0.0002017, whisper_loss=0.08459, over 22472.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01106, ecapa_loss=0.0001836, whisper_loss=0.0911, over 3789560.21 frames. ], batch size: 88, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:46:40,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1324320.0, ans=0.0 2024-08-11 22:46:58,434 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 22:47:10,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1324520.0, ans=0.2 2024-08-11 22:47:22,470 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 22:47:31,891 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 22:47:39,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1324720.0, ans=0.0 2024-08-11 22:47:51,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2050, loss[loss=0.08502, beats_loss=0.01417, ecapa_loss=0.0001239, whisper_loss=0.06961, over 19905.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0111, ecapa_loss=0.0001817, whisper_loss=0.0914, over 3783621.28 frames. ], batch size: 74, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:48:00,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1324820.0, ans=0.1 2024-08-11 22:48:11,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1324920.0, ans=0.1 2024-08-11 22:48:30,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.631e+01 3.014e+01 3.370e+01 4.766e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 22:48:36,941 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 31 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 22:48:38,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-11 22:48:44,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1325120.0, ans=0.1 2024-08-11 22:48:45,234 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 37 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 22:48:49,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1325220.0, ans=0.2 2024-08-11 22:48:57,671 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 22:49:03,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2100, loss[loss=0.1033, beats_loss=0.01018, ecapa_loss=0.0001537, whisper_loss=0.09161, over 16347.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001833, whisper_loss=0.09167, over 3780787.96 frames. ], batch size: 60, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:49:10,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1325320.0, ans=0.125 2024-08-11 22:49:12,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=12.0 2024-08-11 22:49:13,803 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 22:49:14,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1325320.0, ans=0.125 2024-08-11 22:49:27,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1325420.0, ans=0.0 2024-08-11 22:49:52,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1325620.0, ans=0.0 2024-08-11 22:49:54,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1325620.0, ans=0.125 2024-08-11 22:49:56,999 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 22:50:00,920 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:50:03,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1325720.0, ans=0.0 2024-08-11 22:50:17,558 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 22:50:18,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2150, loss[loss=0.09406, beats_loss=0.01132, ecapa_loss=0.0001695, whisper_loss=0.08104, over 22887.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001839, whisper_loss=0.09179, over 3784592.89 frames. ], batch size: 92, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:50:18,858 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 22:50:33,514 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 29 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 22:50:33,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1325920.0, ans=0.2 2024-08-11 22:50:57,002 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.560e+01 2.828e+01 3.267e+01 5.795e+01, threshold=5.656e+01, percent-clipped=0.0 2024-08-11 22:51:00,571 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 22:51:01,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1326120.0, ans=0.125 2024-08-11 22:51:01,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-08-11 22:51:17,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1326220.0, ans=0.125 2024-08-11 22:51:28,635 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-11 22:51:31,051 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2200, loss[loss=0.1045, beats_loss=0.01143, ecapa_loss=0.0001933, whisper_loss=0.09115, over 17931.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01112, ecapa_loss=0.0001823, whisper_loss=0.09238, over 3793033.41 frames. ], batch size: 72, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:51:38,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1326320.0, ans=0.2 2024-08-11 22:51:40,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1326320.0, ans=0.125 2024-08-11 22:51:56,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1326420.0, ans=0.125 2024-08-11 22:52:06,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1326520.0, ans=0.1 2024-08-11 22:52:08,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-08-11 22:52:16,285 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 22:52:19,167 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 22:52:26,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1326620.0, ans=0.0 2024-08-11 22:52:28,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-11 22:52:30,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1326720.0, ans=0.125 2024-08-11 22:52:40,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1326720.0, ans=0.125 2024-08-11 22:52:44,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2250, loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001866, whisper_loss=0.09076, over 15476.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01116, ecapa_loss=0.0001833, whisper_loss=0.09291, over 3824960.71 frames. ], batch size: 61, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:52:45,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1326820.0, ans=0.1 2024-08-11 22:52:46,388 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-11 22:53:08,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=12.0 2024-08-11 22:53:17,977 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 22:53:19,208 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 22:53:22,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1327020.0, ans=0.0 2024-08-11 22:53:24,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.654e+01 2.933e+01 3.292e+01 6.746e+01, threshold=5.867e+01, percent-clipped=1.0 2024-08-11 22:53:50,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1327220.0, ans=0.125 2024-08-11 22:53:50,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1327220.0, ans=0.125 2024-08-11 22:53:50,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1327220.0, ans=0.2 2024-08-11 22:53:58,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2300, loss[loss=0.1031, beats_loss=0.01138, ecapa_loss=0.0001813, whisper_loss=0.08995, over 19925.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01116, ecapa_loss=0.0001836, whisper_loss=0.09301, over 3840898.73 frames. ], batch size: 81, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:54:05,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1327320.0, ans=0.0 2024-08-11 22:54:10,242 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 22:54:27,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-11 22:54:43,021 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 22:55:12,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2350, loss[loss=0.1231, beats_loss=0.008367, ecapa_loss=0.0001591, whisper_loss=0.1131, over 18144.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01111, ecapa_loss=0.0001841, whisper_loss=0.09281, over 3813139.37 frames. ], batch size: 67, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:55:18,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=15.0 2024-08-11 22:55:18,839 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 22:55:23,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1327820.0, ans=0.125 2024-08-11 22:55:26,113 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 22:55:32,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1327920.0, ans=0.1 2024-08-11 22:55:52,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.549e+01 2.872e+01 3.307e+01 6.850e+01, threshold=5.744e+01, percent-clipped=1.0 2024-08-11 22:56:15,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1328220.0, ans=0.2 2024-08-11 22:56:20,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1328220.0, ans=0.0 2024-08-11 22:56:21,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1328220.0, ans=0.125 2024-08-11 22:56:25,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2400, loss[loss=0.1074, beats_loss=0.01191, ecapa_loss=0.0001961, whisper_loss=0.09351, over 17663.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01103, ecapa_loss=0.0001851, whisper_loss=0.09319, over 3814965.80 frames. ], batch size: 71, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:56:30,255 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 22:56:34,512 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 22:56:45,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1328420.0, ans=0.125 2024-08-11 22:56:55,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1328520.0, ans=0.0 2024-08-11 22:57:01,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1328520.0, ans=0.1 2024-08-11 22:57:06,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1328520.0, ans=0.125 2024-08-11 22:57:19,253 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 22:57:35,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1328720.0, ans=0.05 2024-08-11 22:57:41,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2450, loss[loss=0.1236, beats_loss=0.008619, ecapa_loss=0.0001653, whisper_loss=0.1133, over 17339.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01094, ecapa_loss=0.0001853, whisper_loss=0.09337, over 3822893.72 frames. ], batch size: 67, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:58:17,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329020.0, ans=0.1 2024-08-11 22:58:21,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.487e+01 2.806e+01 3.226e+01 5.199e+01, threshold=5.611e+01, percent-clipped=0.0 2024-08-11 22:58:34,038 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 22:58:55,306 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.398e-01 2024-08-11 22:58:55,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1329320.0, ans=0.1 2024-08-11 22:58:56,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2500, loss[loss=0.09803, beats_loss=0.01282, ecapa_loss=0.0001692, whisper_loss=0.08351, over 20117.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01098, ecapa_loss=0.0001844, whisper_loss=0.09295, over 3825992.55 frames. ], batch size: 80, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:59:06,074 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 22:59:32,271 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 22:59:32,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1329520.0, ans=0.0 2024-08-11 22:59:33,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329520.0, ans=0.1 2024-08-11 22:59:46,405 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 23:00:03,863 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:00:09,958 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 23:00:14,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2550, loss[loss=0.1162, beats_loss=0.008463, ecapa_loss=0.0001723, whisper_loss=0.106, over 21571.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01099, ecapa_loss=0.0001842, whisper_loss=0.09353, over 3848857.95 frames. ], batch size: 81, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:00:41,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1329920.0, ans=0.95 2024-08-11 23:00:57,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.600e+01 2.873e+01 3.308e+01 4.841e+01, threshold=5.745e+01, percent-clipped=0.0 2024-08-11 23:01:09,637 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 23:01:11,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1330120.0, ans=0.125 2024-08-11 23:01:29,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1330220.0, ans=0.2 2024-08-11 23:01:30,467 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 23:01:34,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2600, loss[loss=0.09582, beats_loss=0.01333, ecapa_loss=0.0001387, whisper_loss=0.08111, over 19353.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0109, ecapa_loss=0.0001858, whisper_loss=0.09316, over 3826046.14 frames. ], batch size: 76, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:01:38,085 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 23:01:39,751 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:02:10,543 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 23:02:12,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1330520.0, ans=0.125 2024-08-11 23:02:13,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1330520.0, ans=0.2 2024-08-11 23:02:17,002 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 23:02:30,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1330620.0, ans=0.125 2024-08-11 23:02:33,150 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 23:02:35,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1330720.0, ans=0.0 2024-08-11 23:02:37,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1330720.0, ans=0.0 2024-08-11 23:02:46,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1330720.0, ans=10.0 2024-08-11 23:02:48,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-11 23:02:51,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2650, loss[loss=0.1108, beats_loss=0.01333, ecapa_loss=0.0001548, whisper_loss=0.09587, over 23376.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.011, ecapa_loss=0.0001866, whisper_loss=0.09274, over 3847383.37 frames. ], batch size: 92, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:03:06,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1330920.0, ans=0.125 2024-08-11 23:03:14,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1330920.0, ans=0.2 2024-08-11 23:03:35,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.699e+01 2.993e+01 3.555e+01 9.155e+01, threshold=5.987e+01, percent-clipped=1.0 2024-08-11 23:03:36,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-11 23:03:45,400 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 23:03:51,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-11 23:03:53,809 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 23:04:05,751 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 23:04:13,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2700, loss[loss=0.08717, beats_loss=0.01253, ecapa_loss=0.0002356, whisper_loss=0.07229, over 21856.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01098, ecapa_loss=0.0001871, whisper_loss=0.09239, over 3845030.39 frames. ], batch size: 94, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:04:23,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2024-08-11 23:04:40,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1331420.0, ans=0.125 2024-08-11 23:05:06,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1331620.0, ans=0.1 2024-08-11 23:05:09,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1331620.0, ans=0.0 2024-08-11 23:05:12,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1331620.0, ans=0.125 2024-08-11 23:05:21,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1331720.0, ans=0.125 2024-08-11 23:05:32,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2750, loss[loss=0.0935, beats_loss=0.01184, ecapa_loss=0.0001562, whisper_loss=0.08011, over 15705.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01107, ecapa_loss=0.0001869, whisper_loss=0.09246, over 3872389.36 frames. ], batch size: 62, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:05:39,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1331820.0, ans=0.1 2024-08-11 23:05:39,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-11 23:05:55,383 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:06:02,597 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 23:06:10,685 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 23:06:18,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.646e+01 2.996e+01 3.308e+01 5.705e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-11 23:06:20,410 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 23:06:25,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1332120.0, ans=0.0 2024-08-11 23:06:32,809 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:06:43,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1332220.0, ans=0.0 2024-08-11 23:06:44,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-11 23:06:54,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2800, loss[loss=0.1001, beats_loss=0.01289, ecapa_loss=0.000216, whisper_loss=0.08508, over 20925.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01115, ecapa_loss=0.0001857, whisper_loss=0.09221, over 3856743.10 frames. ], batch size: 87, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:07:03,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=12.0 2024-08-11 23:07:10,423 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 23:07:17,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1332420.0, ans=0.125 2024-08-11 23:07:27,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1332520.0, ans=0.05 2024-08-11 23:07:34,589 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 23:07:38,800 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-11 23:07:42,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1332620.0, ans=0.1 2024-08-11 23:07:47,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1332620.0, ans=6.0 2024-08-11 23:07:48,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1332620.0, ans=0.025 2024-08-11 23:08:00,418 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 23:08:02,286 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 23:08:10,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1332720.0, ans=0.0 2024-08-11 23:08:12,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2850, loss[loss=0.1132, beats_loss=0.009795, ecapa_loss=0.0002067, whisper_loss=0.1013, over 21978.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01117, ecapa_loss=0.0001861, whisper_loss=0.0923, over 3829640.81 frames. ], batch size: 88, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:08:14,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1332820.0, ans=0.1 2024-08-11 23:08:18,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-08-11 23:08:27,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.98 vs. limit=22.5 2024-08-11 23:08:48,964 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 23:08:55,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=22.5 2024-08-11 23:08:57,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.704e+01 2.991e+01 3.400e+01 6.217e+01, threshold=5.982e+01, percent-clipped=1.0 2024-08-11 23:08:58,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1333020.0, ans=0.125 2024-08-11 23:08:59,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2024-08-11 23:09:11,381 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 23:09:33,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2900, loss[loss=0.1006, beats_loss=0.01311, ecapa_loss=0.0001888, whisper_loss=0.08565, over 21920.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01121, ecapa_loss=0.0001865, whisper_loss=0.09247, over 3819393.73 frames. ], batch size: 92, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:09:37,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1333320.0, ans=0.125 2024-08-11 23:09:38,435 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 23:09:42,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1333320.0, ans=0.0 2024-08-11 23:09:42,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1333320.0, ans=0.1 2024-08-11 23:09:56,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1333420.0, ans=0.0 2024-08-11 23:10:06,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1333520.0, ans=0.125 2024-08-11 23:10:11,230 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 23:10:20,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-08-11 23:10:24,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1333620.0, ans=0.0 2024-08-11 23:10:30,084 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 23:10:30,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1333620.0, ans=0.125 2024-08-11 23:10:34,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1333620.0, ans=0.0 2024-08-11 23:10:44,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1333720.0, ans=0.0 2024-08-11 23:10:45,929 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 23:10:46,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1333720.0, ans=0.0 2024-08-11 23:10:54,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 2950, loss[loss=0.1161, beats_loss=0.01085, ecapa_loss=0.0001769, whisper_loss=0.1035, over 19783.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001885, whisper_loss=0.09253, over 3850216.62 frames. ], batch size: 79, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:10:58,580 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 23:11:25,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-08-11 23:11:29,309 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 23:11:40,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.642e+01 2.977e+01 3.342e+01 4.548e+01, threshold=5.953e+01, percent-clipped=0.0 2024-08-11 23:11:52,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1334120.0, ans=0.125 2024-08-11 23:11:53,985 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 23:12:04,275 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 23:12:08,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1334220.0, ans=0.125 2024-08-11 23:12:10,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1334220.0, ans=0.0 2024-08-11 23:12:15,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3000, loss[loss=0.1095, beats_loss=0.01037, ecapa_loss=0.0002011, whisper_loss=0.09712, over 21131.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01116, ecapa_loss=0.000188, whisper_loss=0.09283, over 3902229.03 frames. ], batch size: 87, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:12:15,911 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-11 23:12:58,479 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006225, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 23:13:14,979 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on SV_voxceleb1: loss=0.004936, beats_loss=0, ecapa_loss=0.0004936, whisper_loss=0, over 939242.00 frames. 2024-08-11 23:15:19,850 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on AT_audioset: loss=0.02462, beats_loss=0.02462, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 23:15:19,855 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-11 23:15:28,098 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 23:15:34,852 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 23:15:39,398 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.591e-02 2024-08-11 23:15:52,180 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 23:16:39,426 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3050, loss[loss=0.1253, beats_loss=0.01052, ecapa_loss=0.0002193, whisper_loss=0.1126, over 22114.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01114, ecapa_loss=0.0001892, whisper_loss=0.09308, over 3893727.31 frames. ], batch size: 91, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:17:01,560 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:17:14,435 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 23:17:16,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1335020.0, ans=0.2 2024-08-11 23:17:20,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1335020.0, ans=0.0 2024-08-11 23:17:21,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.698e+01 2.929e+01 3.403e+01 4.861e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-11 23:17:22,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2024-08-11 23:17:33,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2024-08-11 23:17:48,378 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 23:17:49,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1335220.0, ans=10.0 2024-08-11 23:17:53,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3100, loss[loss=0.1169, beats_loss=0.009469, ecapa_loss=0.0001763, whisper_loss=0.1056, over 22819.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01122, ecapa_loss=0.0001893, whisper_loss=0.09221, over 3857368.71 frames. ], batch size: 88, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:17:55,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1335320.0, ans=0.5 2024-08-11 23:18:27,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1335520.0, ans=0.125 2024-08-11 23:18:30,514 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 23:18:30,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1335520.0, ans=0.025 2024-08-11 23:18:30,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1335520.0, ans=0.1 2024-08-11 23:18:33,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1335520.0, ans=0.125 2024-08-11 23:18:35,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1335520.0, ans=0.2 2024-08-11 23:18:38,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1335520.0, ans=0.125 2024-08-11 23:18:54,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1335720.0, ans=0.125 2024-08-11 23:18:55,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1335720.0, ans=0.125 2024-08-11 23:19:10,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3150, loss[loss=0.1009, beats_loss=0.01306, ecapa_loss=0.0002086, whisper_loss=0.0858, over 21467.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0113, ecapa_loss=0.000189, whisper_loss=0.09146, over 3843515.74 frames. ], batch size: 93, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:19:13,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2024-08-11 23:19:16,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1335820.0, ans=0.0 2024-08-11 23:19:52,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.477e+01 2.769e+01 3.279e+01 6.467e+01, threshold=5.538e+01, percent-clipped=1.0 2024-08-11 23:19:59,329 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.644e+05 2024-08-11 23:20:10,430 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 23:20:10,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1336220.0, ans=0.125 2024-08-11 23:20:12,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1336220.0, ans=0.0 2024-08-11 23:20:24,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3200, loss[loss=0.1057, beats_loss=0.01339, ecapa_loss=0.0001723, whisper_loss=0.09055, over 21944.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01134, ecapa_loss=0.0001878, whisper_loss=0.09188, over 3858278.78 frames. ], batch size: 90, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:20:34,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1336320.0, ans=0.125 2024-08-11 23:20:51,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1336420.0, ans=0.0 2024-08-11 23:21:17,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1336620.0, ans=0.2 2024-08-11 23:21:20,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-11 23:21:22,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-11 23:21:24,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1336720.0, ans=0.125 2024-08-11 23:21:28,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1336720.0, ans=0.2 2024-08-11 23:21:35,970 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 23:21:37,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3250, loss[loss=0.1262, beats_loss=0.01097, ecapa_loss=0.0001763, whisper_loss=0.1135, over 20028.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01126, ecapa_loss=0.0001884, whisper_loss=0.09217, over 3865001.80 frames. ], batch size: 77, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:21:48,911 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 23:21:50,622 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 23:21:53,452 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 23:22:18,937 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.451e+01 2.863e+01 3.216e+01 4.803e+01, threshold=5.726e+01, percent-clipped=0.0 2024-08-11 23:22:22,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1337120.0, ans=0.125 2024-08-11 23:22:35,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1337220.0, ans=0.125 2024-08-11 23:22:43,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1337220.0, ans=0.125 2024-08-11 23:22:47,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1337220.0, ans=0.125 2024-08-11 23:22:52,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3300, loss[loss=0.09037, beats_loss=0.01113, ecapa_loss=0.0001611, whisper_loss=0.07763, over 17446.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01124, ecapa_loss=0.0001886, whisper_loss=0.09181, over 3846340.52 frames. ], batch size: 67, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:22:56,179 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 23:23:15,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-08-11 23:23:27,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1337520.0, ans=0.125 2024-08-11 23:23:27,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1337520.0, ans=0.2 2024-08-11 23:23:32,980 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 23:23:39,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1337620.0, ans=0.0 2024-08-11 23:24:02,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-11 23:24:02,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1337720.0, ans=0.125 2024-08-11 23:24:06,634 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3350, loss[loss=0.1033, beats_loss=0.01097, ecapa_loss=0.0001761, whisper_loss=0.09053, over 18493.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01116, ecapa_loss=0.0001884, whisper_loss=0.09213, over 3898580.32 frames. ], batch size: 74, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:24:08,708 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.329e-02 2024-08-11 23:24:13,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=12.0 2024-08-11 23:24:14,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1337820.0, ans=0.125 2024-08-11 23:24:25,493 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 23:24:28,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1337920.0, ans=0.1 2024-08-11 23:24:37,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1338020.0, ans=0.125 2024-08-11 23:24:40,832 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 23:24:45,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.630e+01 2.898e+01 3.415e+01 6.649e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-11 23:24:52,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1338120.0, ans=0.2 2024-08-11 23:25:00,932 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-11 23:25:12,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1338220.0, ans=0.125 2024-08-11 23:25:13,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1338220.0, ans=0.2 2024-08-11 23:25:16,211 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 36 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 23:25:17,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3400, loss[loss=0.1213, beats_loss=0.0116, ecapa_loss=0.0001686, whisper_loss=0.108, over 24170.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01123, ecapa_loss=0.0001873, whisper_loss=0.09169, over 3895597.94 frames. ], batch size: 95, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:25:17,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1338320.0, ans=0.0 2024-08-11 23:25:20,090 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 23:25:32,444 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 23:25:38,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1338420.0, ans=0.125 2024-08-11 23:25:40,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1338420.0, ans=0.125 2024-08-11 23:25:52,687 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 23:25:54,280 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 23:25:56,988 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 23:25:57,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=8.0 2024-08-11 23:25:58,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1338620.0, ans=0.0 2024-08-11 23:25:59,656 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 23:26:02,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1338620.0, ans=0.0 2024-08-11 23:26:09,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-11 23:26:15,469 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 23:26:27,331 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3450, loss[loss=0.09483, beats_loss=0.01385, ecapa_loss=0.0001542, whisper_loss=0.07944, over 19500.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01116, ecapa_loss=0.0001891, whisper_loss=0.09096, over 3869114.63 frames. ], batch size: 77, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:26:30,493 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-11 23:26:32,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1338820.0, ans=0.125 2024-08-11 23:27:04,096 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 23:27:08,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.641e+01 2.848e+01 3.378e+01 1.355e+02, threshold=5.696e+01, percent-clipped=1.0 2024-08-11 23:27:27,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1339220.0, ans=0.125 2024-08-11 23:27:34,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1339220.0, ans=0.1 2024-08-11 23:27:35,173 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 23:27:38,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3500, loss[loss=0.1185, beats_loss=0.009737, ecapa_loss=0.000173, whisper_loss=0.1071, over 22315.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01117, ecapa_loss=0.0001882, whisper_loss=0.09147, over 3888456.89 frames. ], batch size: 87, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:27:59,325 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 23:28:06,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1339520.0, ans=0.125 2024-08-11 23:28:07,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1339520.0, ans=0.125 2024-08-11 23:28:17,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1339520.0, ans=0.1 2024-08-11 23:28:17,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-11 23:28:28,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1339620.0, ans=0.1 2024-08-11 23:28:45,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1339720.0, ans=0.5 2024-08-11 23:28:48,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1339720.0, ans=0.1 2024-08-11 23:28:49,071 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 23:28:50,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3550, loss[loss=0.1056, beats_loss=0.01066, ecapa_loss=0.0002119, whisper_loss=0.09283, over 21430.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01118, ecapa_loss=0.0001895, whisper_loss=0.09125, over 3895211.89 frames. ], batch size: 88, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:29:01,824 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 23:29:08,406 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 23:29:08,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1339920.0, ans=0.0 2024-08-11 23:29:10,013 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 23:29:22,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1340020.0, ans=0.0 2024-08-11 23:29:32,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.587e+01 2.900e+01 3.239e+01 4.496e+01, threshold=5.800e+01, percent-clipped=0.0 2024-08-11 23:29:36,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=12.0 2024-08-11 23:29:49,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2024-08-11 23:29:51,731 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 23:30:02,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1340220.0, ans=0.125 2024-08-11 23:30:06,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3600, loss[loss=0.1047, beats_loss=0.0119, ecapa_loss=0.0002116, whisper_loss=0.0907, over 20169.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0111, ecapa_loss=0.0001888, whisper_loss=0.09173, over 3891178.33 frames. ], batch size: 88, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:30:13,017 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 23:30:15,854 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 23:31:18,931 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-11 23:31:23,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3650, loss[loss=0.08905, beats_loss=0.01337, ecapa_loss=0.0002323, whisper_loss=0.07336, over 14162.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.0001896, whisper_loss=0.09156, over 3876975.92 frames. ], batch size: 60, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:31:23,601 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 23:31:37,099 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 16 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 23:31:38,343 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 23:31:47,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1340920.0, ans=0.125 2024-08-11 23:32:07,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=12.0 2024-08-11 23:32:08,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.620e+01 2.825e+01 3.170e+01 6.141e+01, threshold=5.649e+01, percent-clipped=1.0 2024-08-11 23:32:37,398 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 14 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 23:32:40,215 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 23:32:41,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3700, loss[loss=0.1055, beats_loss=0.009695, ecapa_loss=0.00014, whisper_loss=0.09437, over 15337.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01109, ecapa_loss=0.0001884, whisper_loss=0.09142, over 3850273.67 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:32:41,669 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 23:32:50,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1341320.0, ans=0.1 2024-08-11 23:32:51,604 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 23:32:55,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1341320.0, ans=0.125 2024-08-11 23:32:56,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1341420.0, ans=0.0 2024-08-11 23:33:04,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1341420.0, ans=0.125 2024-08-11 23:33:04,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1341420.0, ans=0.5 2024-08-11 23:33:15,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1341520.0, ans=0.125 2024-08-11 23:33:22,600 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 23:33:27,121 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 23:33:40,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-08-11 23:33:40,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=22.5 2024-08-11 23:33:57,671 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3750, loss[loss=0.07217, beats_loss=0.01239, ecapa_loss=0.0002075, whisper_loss=0.05771, over 14399.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01111, ecapa_loss=0.0001886, whisper_loss=0.09096, over 3806990.35 frames. ], batch size: 58, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:34:04,016 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 23:34:20,280 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 23:34:24,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=22.5 2024-08-11 23:34:30,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1342020.0, ans=0.0 2024-08-11 23:34:35,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1342020.0, ans=0.0 2024-08-11 23:34:41,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.517e+01 2.756e+01 3.054e+01 4.813e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-11 23:34:55,065 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 23:34:57,805 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 23:34:59,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1342220.0, ans=0.0 2024-08-11 23:35:11,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3800, loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001395, whisper_loss=0.08958, over 21362.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01111, ecapa_loss=0.0001903, whisper_loss=0.0915, over 3806204.24 frames. ], batch size: 78, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:35:20,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1342320.0, ans=0.5 2024-08-11 23:35:23,027 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 23:35:47,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-11 23:36:09,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2024-08-11 23:36:09,767 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-11 23:36:19,436 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 23:36:23,751 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 23:36:25,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3850, loss[loss=0.108, beats_loss=0.01283, ecapa_loss=0.0001669, whisper_loss=0.09354, over 15090.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01105, ecapa_loss=0.0001902, whisper_loss=0.09254, over 3830417.07 frames. ], batch size: 61, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:36:32,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-11 23:36:45,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1342920.0, ans=0.1 2024-08-11 23:36:48,301 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 23:37:00,505 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 23:37:02,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1343020.0, ans=0.0 2024-08-11 23:37:02,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1343020.0, ans=0.2 2024-08-11 23:37:04,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.584e+01 2.893e+01 3.554e+01 5.203e+01, threshold=5.787e+01, percent-clipped=0.0 2024-08-11 23:37:28,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1343220.0, ans=0.04949747468305833 2024-08-11 23:37:30,964 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 23:37:33,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3900, loss[loss=0.09792, beats_loss=0.009819, ecapa_loss=0.0002125, whisper_loss=0.08598, over 17916.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01102, ecapa_loss=0.0001905, whisper_loss=0.09389, over 3859662.15 frames. ], batch size: 73, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:37:34,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-11 23:37:55,707 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 23:38:05,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1343520.0, ans=0.1 2024-08-11 23:38:15,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1343620.0, ans=0.125 2024-08-11 23:38:32,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1343720.0, ans=10.0 2024-08-11 23:38:33,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1343720.0, ans=0.0 2024-08-11 23:38:34,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1343720.0, ans=0.125 2024-08-11 23:38:42,260 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 3950, loss[loss=0.07369, beats_loss=0.01223, ecapa_loss=0.0001958, whisper_loss=0.0595, over 18113.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01103, ecapa_loss=0.0001892, whisper_loss=0.09359, over 3865238.10 frames. ], batch size: 75, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:39:09,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1344020.0, ans=0.1 2024-08-11 23:39:11,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-11 23:39:13,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2024-08-11 23:39:15,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1344020.0, ans=0.2 2024-08-11 23:39:22,172 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.690e+01 2.958e+01 3.481e+01 5.578e+01, threshold=5.915e+01, percent-clipped=0.0 2024-08-11 23:39:23,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2024-08-11 23:39:41,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1344220.0, ans=0.1 2024-08-11 23:39:47,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344220.0, ans=0.1 2024-08-11 23:39:51,059 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4000, loss[loss=0.1018, beats_loss=0.01215, ecapa_loss=0.0001736, whisper_loss=0.08795, over 19908.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01111, ecapa_loss=0.0001886, whisper_loss=0.0928, over 3851223.98 frames. ], batch size: 79, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:40:02,499 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 23:40:02,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1344320.0, ans=0.0 2024-08-11 23:40:15,734 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 23:40:21,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1344520.0, ans=0.2 2024-08-11 23:40:29,027 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 9 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-11 23:40:53,829 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 23:40:55,021 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 23:40:56,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1344720.0, ans=0.125 2024-08-11 23:41:00,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4050, loss[loss=0.09869, beats_loss=0.009885, ecapa_loss=0.0001925, whisper_loss=0.08688, over 16916.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01099, ecapa_loss=0.0001908, whisper_loss=0.09324, over 3829087.53 frames. ], batch size: 65, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:41:02,508 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.750e-02 2024-08-11 23:41:06,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1344820.0, ans=0.125 2024-08-11 23:41:10,162 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 23:41:39,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.615e+01 3.014e+01 3.367e+01 5.886e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 23:41:51,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1345120.0, ans=0.1 2024-08-11 23:42:05,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1345220.0, ans=0.125 2024-08-11 23:42:08,751 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4100, loss[loss=0.1162, beats_loss=0.01078, ecapa_loss=0.0001732, whisper_loss=0.1037, over 23224.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.011, ecapa_loss=0.00019, whisper_loss=0.09357, over 3867593.00 frames. ], batch size: 91, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:42:14,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1345320.0, ans=0.025 2024-08-11 23:42:23,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1345420.0, ans=0.125 2024-08-11 23:42:26,989 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 23:42:28,398 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 23:42:31,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1345420.0, ans=0.07 2024-08-11 23:42:40,955 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 23:42:54,661 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 23:43:06,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2024-08-11 23:43:10,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1345720.0, ans=0.025 2024-08-11 23:43:11,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=12.0 2024-08-11 23:43:18,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4150, loss[loss=0.09496, beats_loss=0.01182, ecapa_loss=0.0002083, whisper_loss=0.08106, over 20370.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01108, ecapa_loss=0.0001902, whisper_loss=0.09284, over 3871692.99 frames. ], batch size: 84, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:43:29,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2024-08-11 23:43:30,907 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 23:43:48,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1346020.0, ans=0.0 2024-08-11 23:43:50,741 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 23:43:58,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.669e+01 2.869e+01 3.344e+01 4.634e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-11 23:43:59,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1346120.0, ans=0.0 2024-08-11 23:44:04,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1346120.0, ans=0.125 2024-08-11 23:44:10,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1346120.0, ans=0.125 2024-08-11 23:44:17,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2024-08-11 23:44:18,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1346220.0, ans=0.2 2024-08-11 23:44:19,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1346220.0, ans=0.2 2024-08-11 23:44:27,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4200, loss[loss=0.07551, beats_loss=0.01534, ecapa_loss=0.0001339, whisper_loss=0.05884, over 22260.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01112, ecapa_loss=0.000189, whisper_loss=0.09333, over 3899368.56 frames. ], batch size: 90, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:44:29,266 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 23:44:50,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1346420.0, ans=0.125 2024-08-11 23:44:51,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1346420.0, ans=0.0 2024-08-11 23:45:20,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1346620.0, ans=0.125 2024-08-11 23:45:23,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1346720.0, ans=0.125 2024-08-11 23:45:23,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1346720.0, ans=15.0 2024-08-11 23:45:23,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-11 23:45:27,067 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 23:45:36,416 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4250, loss[loss=0.07957, beats_loss=0.01212, ecapa_loss=0.0002247, whisper_loss=0.0652, over 18040.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01121, ecapa_loss=0.0001881, whisper_loss=0.09232, over 3894695.59 frames. ], batch size: 82, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:45:36,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1346820.0, ans=0.1 2024-08-11 23:45:49,196 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 23:45:51,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-11 23:45:59,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1346920.0, ans=15.0 2024-08-11 23:45:59,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-11 23:46:14,832 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 23:46:15,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1347020.0, ans=0.125 2024-08-11 23:46:17,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.600e+01 2.838e+01 3.253e+01 4.399e+01, threshold=5.676e+01, percent-clipped=0.0 2024-08-11 23:46:19,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1347120.0, ans=0.0 2024-08-11 23:46:22,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1347120.0, ans=0.035 2024-08-11 23:46:26,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1347120.0, ans=0.0 2024-08-11 23:46:35,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1347220.0, ans=0.125 2024-08-11 23:46:46,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4300, loss[loss=0.09295, beats_loss=0.01334, ecapa_loss=0.0001913, whisper_loss=0.0777, over 22170.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01113, ecapa_loss=0.000188, whisper_loss=0.09244, over 3849531.83 frames. ], batch size: 92, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:47:04,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1347420.0, ans=0.125 2024-08-11 23:47:07,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1347420.0, ans=0.125 2024-08-11 23:47:07,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1347420.0, ans=0.125 2024-08-11 23:47:25,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1347520.0, ans=0.0 2024-08-11 23:47:33,895 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 23:47:45,102 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 23:47:45,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1347720.0, ans=0.1 2024-08-11 23:47:47,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2024-08-11 23:47:47,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-11 23:47:56,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4350, loss[loss=0.1069, beats_loss=0.01052, ecapa_loss=0.0001996, whisper_loss=0.09439, over 21449.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0111, ecapa_loss=0.0001881, whisper_loss=0.09214, over 3839027.79 frames. ], batch size: 88, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:47:57,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1347820.0, ans=0.09899494936611666 2024-08-11 23:48:00,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1347820.0, ans=0.0 2024-08-11 23:48:11,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-11 23:48:13,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1347920.0, ans=0.1 2024-08-11 23:48:13,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-11 23:48:36,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.570e+01 2.850e+01 3.397e+01 5.504e+01, threshold=5.701e+01, percent-clipped=0.0 2024-08-11 23:48:42,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1348120.0, ans=0.025 2024-08-11 23:48:46,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1348120.0, ans=0.0 2024-08-11 23:48:47,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1348120.0, ans=0.125 2024-08-11 23:49:06,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1348320.0, ans=0.125 2024-08-11 23:49:07,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4400, loss[loss=0.09658, beats_loss=0.01062, ecapa_loss=0.000185, whisper_loss=0.08411, over 15961.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01102, ecapa_loss=0.0001873, whisper_loss=0.09267, over 3842313.85 frames. ], batch size: 61, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:49:11,554 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 23:49:12,810 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 23:49:36,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1348520.0, ans=0.125 2024-08-11 23:49:41,091 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-11 23:49:41,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1348520.0, ans=0.2 2024-08-11 23:49:50,025 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 23:49:54,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1348620.0, ans=0.125 2024-08-11 23:50:03,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-11 23:50:08,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1348720.0, ans=0.1 2024-08-11 23:50:19,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4450, loss[loss=0.08108, beats_loss=0.0152, ecapa_loss=0.0001486, whisper_loss=0.06439, over 18772.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01107, ecapa_loss=0.0001864, whisper_loss=0.09261, over 3903862.54 frames. ], batch size: 76, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:50:21,204 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 23:50:28,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1348820.0, ans=0.0 2024-08-11 23:50:35,881 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 23:50:36,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1348920.0, ans=0.125 2024-08-11 23:50:39,095 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.381e+02 2024-08-11 23:50:42,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-11 23:50:48,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1349020.0, ans=0.125 2024-08-11 23:51:00,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1349020.0, ans=0.2 2024-08-11 23:51:01,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.643e+01 3.000e+01 3.439e+01 5.029e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-11 23:51:19,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.56 vs. limit=22.5 2024-08-11 23:51:24,767 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 23:51:29,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4500, loss[loss=0.1078, beats_loss=0.009713, ecapa_loss=0.0001699, whisper_loss=0.09643, over 17658.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001868, whisper_loss=0.0921, over 3860050.21 frames. ], batch size: 66, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:51:37,478 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 23:51:41,410 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 23:51:55,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1349420.0, ans=0.125 2024-08-11 23:51:56,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1349520.0, ans=0.1 2024-08-11 23:51:58,076 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 23:52:10,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1349620.0, ans=0.2 2024-08-11 23:52:38,982 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4550, loss[loss=0.1145, beats_loss=0.01012, ecapa_loss=0.0001606, whisper_loss=0.1027, over 24386.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01101, ecapa_loss=0.0001867, whisper_loss=0.09289, over 3858421.03 frames. ], batch size: 93, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:52:47,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-08-11 23:52:49,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2024-08-11 23:53:06,516 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 23:53:19,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.529e+01 2.869e+01 3.379e+01 6.425e+01, threshold=5.739e+01, percent-clipped=1.0 2024-08-11 23:53:19,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1350120.0, ans=0.0 2024-08-11 23:53:23,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1350120.0, ans=0.0 2024-08-11 23:53:35,604 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 23:53:46,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1350220.0, ans=0.125 2024-08-11 23:53:48,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4600, loss[loss=0.1185, beats_loss=0.009244, ecapa_loss=0.0001692, whisper_loss=0.1076, over 17440.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.000187, whisper_loss=0.09243, over 3862749.36 frames. ], batch size: 66, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:53:50,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1350320.0, ans=0.2 2024-08-11 23:53:57,570 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 10 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 23:54:11,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2024-08-11 23:54:14,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-08-11 23:54:15,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1350420.0, ans=0.0 2024-08-11 23:54:16,664 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 23:54:25,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1350520.0, ans=0.0 2024-08-11 23:54:35,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1350620.0, ans=0.1 2024-08-11 23:54:35,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1350620.0, ans=0.1 2024-08-11 23:55:00,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4650, loss[loss=0.1019, beats_loss=0.01332, ecapa_loss=0.0001903, whisper_loss=0.08666, over 12657.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0112, ecapa_loss=0.0001864, whisper_loss=0.0913, over 3839553.93 frames. ], batch size: 54, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:55:12,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2024-08-11 23:55:33,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1351020.0, ans=0.125 2024-08-11 23:55:33,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1351020.0, ans=0.5 2024-08-11 23:55:37,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1351020.0, ans=0.0 2024-08-11 23:55:43,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.678e+01 3.059e+01 3.452e+01 5.229e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-11 23:55:44,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-11 23:55:52,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=15.0 2024-08-11 23:56:05,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1351220.0, ans=0.0 2024-08-11 23:56:06,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1351220.0, ans=0.1 2024-08-11 23:56:08,182 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 28 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-11 23:56:09,528 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 23:56:13,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4700, loss[loss=0.1171, beats_loss=0.009431, ecapa_loss=0.0002231, whisper_loss=0.1055, over 19209.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01106, ecapa_loss=0.0001882, whisper_loss=0.09264, over 3830282.69 frames. ], batch size: 78, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:56:15,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1351320.0, ans=0.2 2024-08-11 23:56:31,194 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 23:56:44,086 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-11 23:57:05,926 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 23:57:06,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1351620.0, ans=0.125 2024-08-11 23:57:18,114 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 8 from Vox, 36 fro AS 2024-08-11 23:57:22,738 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 23:57:27,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4750, loss[loss=0.0872, beats_loss=0.01181, ecapa_loss=0.000161, whisper_loss=0.07378, over 20585.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01116, ecapa_loss=0.0001853, whisper_loss=0.09264, over 3867306.71 frames. ], batch size: 81, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:57:35,186 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 23:57:37,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-11 23:57:52,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1351920.0, ans=0.125 2024-08-11 23:57:53,966 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 23:58:00,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1352020.0, ans=10.0 2024-08-11 23:58:05,755 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 41 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-11 23:58:09,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1352020.0, ans=0.1 2024-08-11 23:58:09,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.685e+01 3.044e+01 3.744e+01 5.202e+01, threshold=6.087e+01, percent-clipped=0.0 2024-08-11 23:58:27,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1352220.0, ans=0.05 2024-08-11 23:58:41,673 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4800, loss[loss=0.1302, beats_loss=0.008527, ecapa_loss=0.0001911, whisper_loss=0.1198, over 19714.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01115, ecapa_loss=0.0001859, whisper_loss=0.09304, over 3908990.53 frames. ], batch size: 74, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:58:42,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1352320.0, ans=0.0 2024-08-11 23:58:46,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1352320.0, ans=0.1 2024-08-11 23:58:55,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=12.0 2024-08-11 23:58:58,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1352420.0, ans=0.2 2024-08-11 23:59:02,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1352420.0, ans=0.125 2024-08-11 23:59:15,398 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 23:59:38,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1352620.0, ans=0.1 2024-08-11 23:59:39,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1352620.0, ans=0.0 2024-08-11 23:59:50,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1352720.0, ans=0.1 2024-08-11 23:59:55,893 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 23:59:57,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4850, loss[loss=0.07935, beats_loss=0.01117, ecapa_loss=0.000243, whisper_loss=0.06574, over 14037.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01119, ecapa_loss=0.000188, whisper_loss=0.09297, over 3935626.55 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:59:58,876 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 00:00:04,853 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 00:00:07,375 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 00:00:35,715 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 00:00:39,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.736e+01 3.106e+01 3.475e+01 1.081e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-12 00:00:57,141 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-12 00:00:57,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1353220.0, ans=0.125 2024-08-12 00:01:02,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1353220.0, ans=0.125 2024-08-12 00:01:06,214 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 00:01:09,798 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:01:10,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4900, loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0002396, whisper_loss=0.09144, over 18329.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01115, ecapa_loss=0.0001887, whisper_loss=0.09318, over 3904618.51 frames. ], batch size: 78, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:01:53,481 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 00:01:56,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1353620.0, ans=0.2 2024-08-12 00:01:59,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1353620.0, ans=0.1 2024-08-12 00:02:06,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1353620.0, ans=0.125 2024-08-12 00:02:13,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1353720.0, ans=0.125 2024-08-12 00:02:22,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=12.0 2024-08-12 00:02:24,430 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 4950, loss[loss=0.1211, beats_loss=0.01054, ecapa_loss=0.0001612, whisper_loss=0.109, over 20090.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0112, ecapa_loss=0.0001877, whisper_loss=0.09238, over 3891227.31 frames. ], batch size: 77, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:02:25,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1353820.0, ans=0.125 2024-08-12 00:02:26,152 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 00:02:34,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1353820.0, ans=0.1 2024-08-12 00:02:34,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-12 00:02:41,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=22.5 2024-08-12 00:02:43,591 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 00:02:43,900 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.195e+00 2024-08-12 00:02:51,439 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 00:02:52,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1353920.0, ans=0.1 2024-08-12 00:03:07,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.588e+01 2.867e+01 3.231e+01 4.752e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 00:03:11,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1354120.0, ans=0.0 2024-08-12 00:03:13,539 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 00:03:17,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1354120.0, ans=0.125 2024-08-12 00:03:19,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1354120.0, ans=0.125 2024-08-12 00:03:24,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1354220.0, ans=0.05 2024-08-12 00:03:30,419 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 00:03:37,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5000, loss[loss=0.08168, beats_loss=0.01411, ecapa_loss=0.0001929, whisper_loss=0.06564, over 18685.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01131, ecapa_loss=0.0001867, whisper_loss=0.09175, over 3887846.38 frames. ], batch size: 81, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:03:49,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1354320.0, ans=0.125 2024-08-12 00:04:00,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1354420.0, ans=0.0 2024-08-12 00:04:13,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1354520.0, ans=0.125 2024-08-12 00:04:15,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1354520.0, ans=0.2 2024-08-12 00:04:33,509 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 00:04:48,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5050, loss[loss=0.1129, beats_loss=0.01139, ecapa_loss=0.000144, whisper_loss=0.1, over 17019.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01136, ecapa_loss=0.0001875, whisper_loss=0.0912, over 3886691.27 frames. ], batch size: 62, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:05:07,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1354920.0, ans=0.07 2024-08-12 00:05:12,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1354920.0, ans=0.0 2024-08-12 00:05:13,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1354920.0, ans=0.125 2024-08-12 00:05:25,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1355020.0, ans=0.125 2024-08-12 00:05:29,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.677e+01 3.041e+01 3.640e+01 6.697e+01, threshold=6.081e+01, percent-clipped=3.0 2024-08-12 00:05:37,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1355120.0, ans=0.0 2024-08-12 00:05:39,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1355120.0, ans=0.125 2024-08-12 00:05:43,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1355120.0, ans=0.1 2024-08-12 00:05:57,785 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 00:06:00,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5100, loss[loss=0.1274, beats_loss=0.009438, ecapa_loss=0.0001905, whisper_loss=0.116, over 20049.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01133, ecapa_loss=0.0001876, whisper_loss=0.09139, over 3898938.36 frames. ], batch size: 77, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:06:02,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1355320.0, ans=0.125 2024-08-12 00:06:12,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.40 vs. limit=10.0 2024-08-12 00:06:21,370 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 00:06:21,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1355420.0, ans=0.07 2024-08-12 00:06:27,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1355520.0, ans=0.125 2024-08-12 00:06:42,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1355620.0, ans=0.0 2024-08-12 00:06:50,646 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 00:06:54,941 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 00:07:09,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=15.0 2024-08-12 00:07:09,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5150, loss[loss=0.1035, beats_loss=0.009808, ecapa_loss=0.0001938, whisper_loss=0.09178, over 15677.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01134, ecapa_loss=0.0001867, whisper_loss=0.09146, over 3895402.16 frames. ], batch size: 64, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:07:16,982 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 00:07:24,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-08-12 00:07:32,925 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 00:07:46,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1356020.0, ans=0.125 2024-08-12 00:07:50,897 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.961e+01 3.572e+01 5.621e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-12 00:07:51,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=12.0 2024-08-12 00:07:55,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1356120.0, ans=0.125 2024-08-12 00:08:01,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-12 00:08:04,070 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 00:08:06,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1356220.0, ans=15.0 2024-08-12 00:08:09,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1356220.0, ans=10.0 2024-08-12 00:08:11,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1356220.0, ans=0.125 2024-08-12 00:08:12,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1356220.0, ans=0.125 2024-08-12 00:08:15,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1356220.0, ans=0.125 2024-08-12 00:08:16,417 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 18 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-12 00:08:20,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5200, loss[loss=0.09608, beats_loss=0.01153, ecapa_loss=0.0001785, whisper_loss=0.08276, over 15798.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01139, ecapa_loss=0.000185, whisper_loss=0.09112, over 3893217.73 frames. ], batch size: 62, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:08:24,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1356320.0, ans=10.0 2024-08-12 00:08:35,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1356420.0, ans=0.1 2024-08-12 00:08:37,704 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-12 00:09:21,025 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-12 00:09:24,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1356720.0, ans=0.0 2024-08-12 00:09:24,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1356720.0, ans=0.2 2024-08-12 00:09:30,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5250, loss[loss=0.1078, beats_loss=0.009931, ecapa_loss=0.0001619, whisper_loss=0.09624, over 18084.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01133, ecapa_loss=0.0001849, whisper_loss=0.09089, over 3856004.52 frames. ], batch size: 70, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:09:32,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1356820.0, ans=0.2 2024-08-12 00:09:43,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1356920.0, ans=0.1 2024-08-12 00:09:57,817 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 00:10:06,039 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 00:10:06,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1357020.0, ans=0.125 2024-08-12 00:10:11,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.537e+01 2.858e+01 3.258e+01 4.916e+01, threshold=5.717e+01, percent-clipped=0.0 2024-08-12 00:10:28,519 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:10:40,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5300, loss[loss=0.1225, beats_loss=0.009645, ecapa_loss=0.000151, whisper_loss=0.1113, over 16823.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01122, ecapa_loss=0.000186, whisper_loss=0.0921, over 3859647.16 frames. ], batch size: 63, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:10:41,092 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:11:00,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1357420.0, ans=0.0 2024-08-12 00:11:08,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1357520.0, ans=0.125 2024-08-12 00:11:11,435 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 00:11:30,884 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 00:11:34,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1357620.0, ans=0.1 2024-08-12 00:11:40,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1357720.0, ans=0.0 2024-08-12 00:11:49,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5350, loss[loss=0.1212, beats_loss=0.008114, ecapa_loss=0.0001724, whisper_loss=0.1114, over 15937.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0112, ecapa_loss=0.0001866, whisper_loss=0.09177, over 3852838.45 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:11:49,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1357820.0, ans=0.1 2024-08-12 00:12:05,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-12 00:12:10,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1357920.0, ans=0.125 2024-08-12 00:12:12,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1357920.0, ans=0.1 2024-08-12 00:12:14,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1357920.0, ans=0.125 2024-08-12 00:12:18,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1358020.0, ans=0.0 2024-08-12 00:12:23,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1358020.0, ans=0.1 2024-08-12 00:12:30,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.572e+01 2.816e+01 3.245e+01 5.813e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 00:12:34,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1358120.0, ans=0.0 2024-08-12 00:12:36,763 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 00:12:44,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.33 vs. limit=15.0 2024-08-12 00:12:54,735 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 00:13:01,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5400, loss[loss=0.0987, beats_loss=0.01162, ecapa_loss=0.0001529, whisper_loss=0.08555, over 16516.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0112, ecapa_loss=0.0001856, whisper_loss=0.0919, over 3871466.91 frames. ], batch size: 64, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:13:06,589 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 00:13:13,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1358320.0, ans=0.025 2024-08-12 00:13:22,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1358420.0, ans=0.0 2024-08-12 00:13:28,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1358420.0, ans=0.0 2024-08-12 00:13:28,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1358420.0, ans=0.1 2024-08-12 00:13:44,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1358520.0, ans=0.125 2024-08-12 00:13:48,544 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-12 00:13:53,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1358620.0, ans=0.2 2024-08-12 00:13:54,654 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 00:14:10,285 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-12 00:14:13,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1358720.0, ans=0.125 2024-08-12 00:14:18,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5450, loss[loss=0.1288, beats_loss=0.007786, ecapa_loss=0.0001839, whisper_loss=0.1192, over 17818.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01115, ecapa_loss=0.0001874, whisper_loss=0.09223, over 3842383.01 frames. ], batch size: 69, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:14:23,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-12 00:14:35,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1358920.0, ans=0.035 2024-08-12 00:14:36,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=12.0 2024-08-12 00:14:42,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1358920.0, ans=0.125 2024-08-12 00:14:49,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-12 00:14:57,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1359020.0, ans=0.0 2024-08-12 00:15:05,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.617e+01 2.957e+01 3.359e+01 7.305e+01, threshold=5.914e+01, percent-clipped=2.0 2024-08-12 00:15:08,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-12 00:15:22,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=12.0 2024-08-12 00:15:36,707 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 12 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 00:15:44,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1359220.0, ans=0.125 2024-08-12 00:15:46,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5500, loss[loss=0.09012, beats_loss=0.01344, ecapa_loss=0.0001629, whisper_loss=0.07505, over 20071.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001858, whisper_loss=0.09272, over 3880105.34 frames. ], batch size: 80, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:15:59,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1359320.0, ans=0.2 2024-08-12 00:16:05,154 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 00:16:17,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1359420.0, ans=0.125 2024-08-12 00:16:20,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1359420.0, ans=0.125 2024-08-12 00:16:21,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1359420.0, ans=0.1 2024-08-12 00:16:30,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1359520.0, ans=0.125 2024-08-12 00:16:41,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1359520.0, ans=0.125 2024-08-12 00:17:05,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2024-08-12 00:17:11,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1359720.0, ans=0.0 2024-08-12 00:17:19,705 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5550, loss[loss=0.09245, beats_loss=0.014, ecapa_loss=0.0002034, whisper_loss=0.07642, over 20984.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01114, ecapa_loss=0.0001861, whisper_loss=0.0929, over 3877693.87 frames. ], batch size: 90, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:17:31,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-12 00:17:39,569 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-12 00:17:55,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-12 00:17:55,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-08-12 00:18:14,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1360020.0, ans=0.0 2024-08-12 00:18:14,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.662e+01 3.000e+01 3.511e+01 5.450e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-12 00:18:23,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1360120.0, ans=0.2 2024-08-12 00:18:25,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1360120.0, ans=0.0 2024-08-12 00:18:40,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1360220.0, ans=0.0 2024-08-12 00:18:48,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1360220.0, ans=0.125 2024-08-12 00:18:53,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5600, loss[loss=0.1139, beats_loss=0.009693, ecapa_loss=0.0001954, whisper_loss=0.1022, over 19420.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01112, ecapa_loss=0.000185, whisper_loss=0.09278, over 3895721.49 frames. ], batch size: 75, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:18:53,692 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 00:19:01,347 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 00:19:16,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1360420.0, ans=0.1 2024-08-12 00:19:25,024 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 00:19:27,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1360420.0, ans=0.0 2024-08-12 00:19:59,123 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 00:20:07,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1360720.0, ans=0.125 2024-08-12 00:20:24,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5650, loss[loss=0.07177, beats_loss=0.01199, ecapa_loss=0.0001854, whisper_loss=0.05792, over 16211.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01117, ecapa_loss=0.0001851, whisper_loss=0.09246, over 3868278.14 frames. ], batch size: 67, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:20:25,101 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 00:20:25,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1360820.0, ans=0.0 2024-08-12 00:20:26,263 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 00:20:29,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1360820.0, ans=0.125 2024-08-12 00:20:41,116 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 00:20:48,048 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 00:21:04,053 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.708e+01 3.179e+01 3.775e+01 1.197e+02, threshold=6.358e+01, percent-clipped=2.0 2024-08-12 00:21:16,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1361120.0, ans=0.125 2024-08-12 00:21:32,911 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5700, loss[loss=0.07653, beats_loss=0.01251, ecapa_loss=0.0002189, whisper_loss=0.06183, over 16774.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01115, ecapa_loss=0.000186, whisper_loss=0.09245, over 3889563.60 frames. ], batch size: 71, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:21:56,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2024-08-12 00:22:03,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1361520.0, ans=0.0 2024-08-12 00:22:37,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1361720.0, ans=0.125 2024-08-12 00:22:40,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5750, loss[loss=0.102, beats_loss=0.008092, ecapa_loss=0.0001736, whisper_loss=0.09217, over 18841.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01115, ecapa_loss=0.0001854, whisper_loss=0.09246, over 3895848.05 frames. ], batch size: 74, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:22:43,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-08-12 00:22:47,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1361820.0, ans=0.125 2024-08-12 00:23:01,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-12 00:23:02,439 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 00:23:05,337 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 00:23:08,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1362020.0, ans=0.125 2024-08-12 00:23:20,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.574e+01 2.789e+01 3.089e+01 4.490e+01, threshold=5.577e+01, percent-clipped=0.0 2024-08-12 00:23:49,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5800, loss[loss=0.1142, beats_loss=0.01132, ecapa_loss=0.0001942, whisper_loss=0.1009, over 22235.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01108, ecapa_loss=0.0001864, whisper_loss=0.09287, over 3882137.71 frames. ], batch size: 91, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:23:53,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1362320.0, ans=0.125 2024-08-12 00:24:03,403 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 00:24:03,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1362420.0, ans=0.1 2024-08-12 00:24:09,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-12 00:24:18,554 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 00:24:19,867 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 00:24:40,763 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 00:24:41,880 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 00:24:45,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1362720.0, ans=0.1 2024-08-12 00:24:48,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1362720.0, ans=0.04949747468305833 2024-08-12 00:24:56,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1362720.0, ans=0.5 2024-08-12 00:24:58,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5850, loss[loss=0.08949, beats_loss=0.01218, ecapa_loss=0.0001722, whisper_loss=0.07558, over 19799.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01113, ecapa_loss=0.0001857, whisper_loss=0.09299, over 3878494.19 frames. ], batch size: 78, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:25:07,785 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 00:25:09,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=12.0 2024-08-12 00:25:36,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1363020.0, ans=0.125 2024-08-12 00:25:37,568 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.515e+01 2.804e+01 3.095e+01 4.578e+01, threshold=5.608e+01, percent-clipped=0.0 2024-08-12 00:25:56,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1363220.0, ans=0.125 2024-08-12 00:26:06,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5900, loss[loss=0.07248, beats_loss=0.01055, ecapa_loss=0.0001741, whisper_loss=0.06019, over 14523.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01103, ecapa_loss=0.0001873, whisper_loss=0.09277, over 3859720.85 frames. ], batch size: 59, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:26:24,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1363420.0, ans=0.2 2024-08-12 00:26:26,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1363420.0, ans=0.125 2024-08-12 00:26:31,337 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 25 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-12 00:26:40,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1363520.0, ans=0.0 2024-08-12 00:26:40,990 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 00:26:49,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1363620.0, ans=0.125 2024-08-12 00:26:57,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1363620.0, ans=0.125 2024-08-12 00:27:03,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1363720.0, ans=0.0 2024-08-12 00:27:05,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-12 00:27:07,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-12 00:27:14,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 5950, loss[loss=0.09927, beats_loss=0.01312, ecapa_loss=0.0001854, whisper_loss=0.08429, over 21247.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01105, ecapa_loss=0.0001862, whisper_loss=0.0926, over 3839088.17 frames. ], batch size: 90, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:27:20,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2024-08-12 00:27:25,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1363820.0, ans=0.125 2024-08-12 00:27:31,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1363920.0, ans=0.125 2024-08-12 00:27:52,173 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 31 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 00:27:53,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.619e+01 2.853e+01 3.292e+01 6.548e+01, threshold=5.706e+01, percent-clipped=1.0 2024-08-12 00:28:18,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1364220.0, ans=0.0 2024-08-12 00:28:20,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-08-12 00:28:22,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6000, loss[loss=0.1069, beats_loss=0.01099, ecapa_loss=0.0001699, whisper_loss=0.09421, over 17848.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001853, whisper_loss=0.09255, over 3861633.13 frames. ], batch size: 69, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:28:22,351 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 00:29:04,118 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on ASR_libri: loss=0.2569, beats_loss=0, ecapa_loss=0.0006172, whisper_loss=0.2508, over 922467.00 frames. 2024-08-12 00:29:22,635 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on SV_voxceleb1: loss=0.005036, beats_loss=0, ecapa_loss=0.0005036, whisper_loss=0, over 939242.00 frames. 2024-08-12 00:31:25,906 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 00:31:25,910 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 00:31:26,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1364320.0, ans=0.125 2024-08-12 00:32:19,442 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 00:32:34,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6050, loss[loss=0.1108, beats_loss=0.0119, ecapa_loss=0.0001653, whisper_loss=0.09722, over 23099.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001844, whisper_loss=0.09251, over 3878016.42 frames. ], batch size: 92, lr: 6.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:32:35,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-08-12 00:32:39,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1364820.0, ans=0.0 2024-08-12 00:32:48,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1364920.0, ans=10.0 2024-08-12 00:32:51,110 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 00:32:54,037 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 00:32:54,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-12 00:32:55,457 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 00:32:58,444 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 00:32:58,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1364920.0, ans=0.0 2024-08-12 00:33:01,449 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 00:33:16,279 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.639e+01 2.972e+01 3.364e+01 6.267e+01, threshold=5.943e+01, percent-clipped=1.0 2024-08-12 00:33:22,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1365120.0, ans=0.2 2024-08-12 00:33:25,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1365120.0, ans=0.2 2024-08-12 00:33:29,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1365220.0, ans=0.2 2024-08-12 00:33:33,070 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 00:33:35,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-08-12 00:33:37,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=12.0 2024-08-12 00:33:40,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1365220.0, ans=0.1 2024-08-12 00:33:44,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6100, loss[loss=0.1038, beats_loss=0.01228, ecapa_loss=0.0001376, whisper_loss=0.09011, over 21243.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01117, ecapa_loss=0.0001855, whisper_loss=0.09183, over 3880175.47 frames. ], batch size: 80, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:34:32,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1365620.0, ans=0.0 2024-08-12 00:34:48,388 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 00:34:54,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6150, loss[loss=0.1017, beats_loss=0.009324, ecapa_loss=0.0002489, whisper_loss=0.08988, over 22437.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01099, ecapa_loss=0.0001885, whisper_loss=0.09324, over 3873676.58 frames. ], batch size: 97, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:35:03,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-12 00:35:10,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1365920.0, ans=0.125 2024-08-12 00:35:21,999 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 00:35:30,661 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-12 00:35:36,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.497e+01 2.771e+01 3.038e+01 4.710e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-12 00:35:51,308 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 00:35:55,358 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 00:35:59,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1366220.0, ans=0.1 2024-08-12 00:36:03,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6200, loss[loss=0.09746, beats_loss=0.01038, ecapa_loss=0.0001479, whisper_loss=0.0856, over 17175.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01099, ecapa_loss=0.0001872, whisper_loss=0.0938, over 3920246.64 frames. ], batch size: 66, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:36:38,540 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-12 00:36:42,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1366520.0, ans=0.5 2024-08-12 00:36:52,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-08-12 00:37:02,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1366720.0, ans=0.07 2024-08-12 00:37:11,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1366820.0, ans=0.125 2024-08-12 00:37:12,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6250, loss[loss=0.09979, beats_loss=0.01193, ecapa_loss=0.0001729, whisper_loss=0.08613, over 22302.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01102, ecapa_loss=0.0001868, whisper_loss=0.0937, over 3914890.13 frames. ], batch size: 90, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:37:17,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1366820.0, ans=0.2 2024-08-12 00:37:18,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1366820.0, ans=0.07 2024-08-12 00:37:33,601 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 00:37:43,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2024-08-12 00:37:44,609 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 00:37:50,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1367020.0, ans=0.125 2024-08-12 00:37:53,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.633e+01 2.869e+01 3.281e+01 7.272e+01, threshold=5.739e+01, percent-clipped=3.0 2024-08-12 00:38:02,647 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 00:38:07,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-12 00:38:21,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.68 vs. limit=6.0 2024-08-12 00:38:21,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6300, loss[loss=0.1126, beats_loss=0.01209, ecapa_loss=0.0001879, whisper_loss=0.09863, over 17635.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01104, ecapa_loss=0.0001856, whisper_loss=0.09385, over 3914159.31 frames. ], batch size: 72, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:38:25,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1367320.0, ans=0.125 2024-08-12 00:38:48,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1367520.0, ans=0.0 2024-08-12 00:38:51,951 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 00:38:56,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1367520.0, ans=0.125 2024-08-12 00:39:05,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1367620.0, ans=0.0 2024-08-12 00:39:07,478 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 00:39:14,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1367620.0, ans=0.0 2024-08-12 00:39:30,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1367820.0, ans=0.125 2024-08-12 00:39:30,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6350, loss[loss=0.07805, beats_loss=0.01336, ecapa_loss=0.0001765, whisper_loss=0.06293, over 15543.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01104, ecapa_loss=0.000187, whisper_loss=0.09397, over 3893866.00 frames. ], batch size: 64, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:39:54,370 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-12 00:40:06,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1368020.0, ans=0.07 2024-08-12 00:40:07,467 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 00:40:08,807 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 00:40:09,157 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:40:12,491 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.594e+01 2.991e+01 3.551e+01 3.558e+02, threshold=5.982e+01, percent-clipped=1.0 2024-08-12 00:40:40,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6400, loss[loss=0.1303, beats_loss=0.008861, ecapa_loss=0.00016, whisper_loss=0.1198, over 22234.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01107, ecapa_loss=0.0001871, whisper_loss=0.09356, over 3885747.30 frames. ], batch size: 81, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:40:48,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.04 vs. limit=22.5 2024-08-12 00:41:15,349 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 00:41:48,282 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:41:49,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6450, loss[loss=0.1205, beats_loss=0.01044, ecapa_loss=0.0001936, whisper_loss=0.1081, over 16682.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01111, ecapa_loss=0.0001864, whisper_loss=0.09316, over 3869834.70 frames. ], batch size: 69, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:41:53,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1368820.0, ans=0.07 2024-08-12 00:41:57,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1368820.0, ans=0.125 2024-08-12 00:42:11,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1368920.0, ans=0.5 2024-08-12 00:42:16,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1369020.0, ans=0.0 2024-08-12 00:42:30,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.638e+01 2.996e+01 3.413e+01 4.809e+01, threshold=5.992e+01, percent-clipped=1.0 2024-08-12 00:42:33,629 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.915e-01 2024-08-12 00:42:58,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6500, loss[loss=0.08783, beats_loss=0.01352, ecapa_loss=0.0001564, whisper_loss=0.07275, over 22690.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01108, ecapa_loss=0.000187, whisper_loss=0.09346, over 3908626.20 frames. ], batch size: 92, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:42:59,782 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 00:43:02,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1369320.0, ans=0.125 2024-08-12 00:43:13,819 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 00:43:15,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1369420.0, ans=0.125 2024-08-12 00:43:17,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1369420.0, ans=0.09899494936611666 2024-08-12 00:43:21,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.80 vs. limit=22.5 2024-08-12 00:43:41,366 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-12 00:43:41,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1369620.0, ans=0.1 2024-08-12 00:43:57,607 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 00:44:00,509 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 24 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-12 00:44:01,744 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 00:44:02,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1369720.0, ans=0.125 2024-08-12 00:44:06,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1369820.0, ans=0.0 2024-08-12 00:44:07,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6550, loss[loss=0.1082, beats_loss=0.01178, ecapa_loss=0.0001752, whisper_loss=0.0947, over 22068.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01118, ecapa_loss=0.0001865, whisper_loss=0.09313, over 3940432.25 frames. ], batch size: 90, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:44:07,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1369820.0, ans=0.1 2024-08-12 00:44:08,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.01 vs. limit=22.5 2024-08-12 00:44:12,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=15.0 2024-08-12 00:44:25,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1369920.0, ans=0.125 2024-08-12 00:44:26,427 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 00:44:30,511 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 00:44:33,563 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 00:44:36,549 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 00:44:48,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.662e+01 3.000e+01 3.439e+01 5.833e+01, threshold=5.999e+01, percent-clipped=0.0 2024-08-12 00:45:08,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1370220.0, ans=0.125 2024-08-12 00:45:16,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6600, loss[loss=0.1303, beats_loss=0.009273, ecapa_loss=0.0001783, whisper_loss=0.1193, over 15549.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01114, ecapa_loss=0.0001867, whisper_loss=0.09373, over 3956508.39 frames. ], batch size: 59, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:45:20,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1370320.0, ans=0.125 2024-08-12 00:45:33,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2024-08-12 00:45:46,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1370520.0, ans=0.125 2024-08-12 00:45:50,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-08-12 00:45:50,780 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-12 00:45:57,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2024-08-12 00:45:59,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1370620.0, ans=0.0 2024-08-12 00:46:16,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1370720.0, ans=0.1 2024-08-12 00:46:25,040 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6650, loss[loss=0.1325, beats_loss=0.009895, ecapa_loss=0.0001794, whisper_loss=0.1208, over 20473.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01103, ecapa_loss=0.0001864, whisper_loss=0.09376, over 3912351.62 frames. ], batch size: 78, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:46:40,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=12.0 2024-08-12 00:46:41,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1370920.0, ans=0.125 2024-08-12 00:46:52,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1371020.0, ans=0.125 2024-08-12 00:47:04,277 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 00:47:06,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.593e+01 2.812e+01 3.124e+01 4.169e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 00:47:10,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1371120.0, ans=0.1 2024-08-12 00:47:12,386 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 00:47:16,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1371120.0, ans=0.125 2024-08-12 00:47:24,982 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 00:47:28,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1371220.0, ans=0.95 2024-08-12 00:47:34,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6700, loss[loss=0.08848, beats_loss=0.01201, ecapa_loss=0.000187, whisper_loss=0.0746, over 13796.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01106, ecapa_loss=0.0001859, whisper_loss=0.09358, over 3900588.06 frames. ], batch size: 57, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:47:35,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1371320.0, ans=0.1 2024-08-12 00:47:55,426 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 00:47:58,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=12.0 2024-08-12 00:48:00,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1371420.0, ans=0.1 2024-08-12 00:48:07,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.11 vs. limit=15.0 2024-08-12 00:48:20,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2024-08-12 00:48:23,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1371620.0, ans=0.125 2024-08-12 00:48:28,377 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 00:48:41,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1371720.0, ans=0.125 2024-08-12 00:48:44,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6750, loss[loss=0.09268, beats_loss=0.01233, ecapa_loss=0.0001673, whisper_loss=0.07867, over 19596.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01107, ecapa_loss=0.0001874, whisper_loss=0.09268, over 3896611.21 frames. ], batch size: 78, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:48:50,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1371820.0, ans=0.125 2024-08-12 00:49:26,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.541e+01 2.925e+01 3.464e+01 4.634e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 00:49:29,741 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 41 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 00:49:34,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1372120.0, ans=0.0 2024-08-12 00:49:35,267 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-12 00:49:46,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1372220.0, ans=0.125 2024-08-12 00:49:54,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6800, loss[loss=0.1262, beats_loss=0.01134, ecapa_loss=0.0001702, whisper_loss=0.1132, over 22918.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01102, ecapa_loss=0.0001889, whisper_loss=0.09266, over 3887068.25 frames. ], batch size: 89, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:50:06,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-12 00:50:07,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1372420.0, ans=0.0 2024-08-12 00:50:29,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1372520.0, ans=0.125 2024-08-12 00:50:31,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1372520.0, ans=0.125 2024-08-12 00:50:34,775 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-12 00:50:40,104 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 00:50:41,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1372620.0, ans=0.2 2024-08-12 00:50:45,624 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 00:50:47,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1372620.0, ans=0.0 2024-08-12 00:51:03,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6850, loss[loss=0.1142, beats_loss=0.01029, ecapa_loss=0.0001706, whisper_loss=0.1022, over 22907.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01112, ecapa_loss=0.0001872, whisper_loss=0.09158, over 3868859.80 frames. ], batch size: 89, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:51:12,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2024-08-12 00:51:26,713 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 00:51:35,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-12 00:51:39,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1373020.0, ans=0.125 2024-08-12 00:51:44,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.602e+01 2.969e+01 3.307e+01 6.186e+01, threshold=5.938e+01, percent-clipped=1.0 2024-08-12 00:51:50,449 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 00:52:00,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1373220.0, ans=0.125 2024-08-12 00:52:02,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2024-08-12 00:52:09,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2024-08-12 00:52:12,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6900, loss[loss=0.08518, beats_loss=0.01278, ecapa_loss=0.0002315, whisper_loss=0.07009, over 15614.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01124, ecapa_loss=0.0001868, whisper_loss=0.09068, over 3841930.85 frames. ], batch size: 64, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:52:22,856 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 26 from Vox, 16 fro AS 2024-08-12 00:52:37,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-12 00:52:41,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1373520.0, ans=0.0 2024-08-12 00:53:07,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1373620.0, ans=0.125 2024-08-12 00:53:11,093 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 00:53:14,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-12 00:53:17,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1373720.0, ans=0.125 2024-08-12 00:53:19,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1373720.0, ans=0.0 2024-08-12 00:53:21,069 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 00:53:23,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 6950, loss[loss=0.09645, beats_loss=0.009471, ecapa_loss=0.0002058, whisper_loss=0.08492, over 18173.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01121, ecapa_loss=0.0001868, whisper_loss=0.09063, over 3817116.67 frames. ], batch size: 74, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:53:26,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1373820.0, ans=0.0 2024-08-12 00:53:29,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1373820.0, ans=0.125 2024-08-12 00:53:34,659 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 00:54:04,637 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 00:54:05,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.522e+01 2.749e+01 3.045e+01 4.953e+01, threshold=5.497e+01, percent-clipped=0.0 2024-08-12 00:54:17,243 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 00:54:29,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1374220.0, ans=0.125 2024-08-12 00:54:33,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7000, loss[loss=0.1359, beats_loss=0.009612, ecapa_loss=0.0001693, whisper_loss=0.1246, over 23069.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01119, ecapa_loss=0.000187, whisper_loss=0.09094, over 3802638.37 frames. ], batch size: 90, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:54:41,924 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 00:54:57,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1374420.0, ans=0.125 2024-08-12 00:55:07,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1374520.0, ans=0.2 2024-08-12 00:55:08,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1374520.0, ans=0.125 2024-08-12 00:55:13,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-08-12 00:55:14,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1374620.0, ans=0.1 2024-08-12 00:55:26,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1374620.0, ans=0.125 2024-08-12 00:55:27,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1374720.0, ans=0.1 2024-08-12 00:55:33,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-12 00:55:34,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1374720.0, ans=10.0 2024-08-12 00:55:41,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7050, loss[loss=0.1149, beats_loss=0.01035, ecapa_loss=0.0001865, whisper_loss=0.1027, over 22891.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01118, ecapa_loss=0.0001861, whisper_loss=0.09158, over 3813440.51 frames. ], batch size: 91, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:55:48,824 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 00:55:52,037 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 00:56:02,130 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 00:56:11,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1375020.0, ans=0.0 2024-08-12 00:56:13,690 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 00:56:23,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.564e+01 2.939e+01 3.594e+01 1.844e+02, threshold=5.878e+01, percent-clipped=7.0 2024-08-12 00:56:34,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1375120.0, ans=10.0 2024-08-12 00:56:38,574 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 00:56:45,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1375220.0, ans=0.125 2024-08-12 00:56:50,739 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7100, loss[loss=0.08659, beats_loss=0.01439, ecapa_loss=0.0001582, whisper_loss=0.07062, over 23283.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01115, ecapa_loss=0.0001836, whisper_loss=0.09168, over 3812668.56 frames. ], batch size: 93, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:56:54,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-08-12 00:57:03,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1375420.0, ans=0.125 2024-08-12 00:57:09,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-08-12 00:57:12,807 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 00:57:15,420 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 17 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 00:57:18,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1375520.0, ans=0.1 2024-08-12 00:57:19,480 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 00:57:25,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=8.0 2024-08-12 00:57:54,634 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 00:57:58,701 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 00:57:59,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7150, loss[loss=0.1036, beats_loss=0.009799, ecapa_loss=0.0002169, whisper_loss=0.09163, over 17066.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01117, ecapa_loss=0.0001832, whisper_loss=0.09154, over 3834102.40 frames. ], batch size: 69, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:58:11,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1375820.0, ans=0.125 2024-08-12 00:58:27,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1376020.0, ans=0.125 2024-08-12 00:58:29,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1376020.0, ans=0.2 2024-08-12 00:58:30,859 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 00:58:36,635 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 00:58:42,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.592e+01 2.864e+01 3.293e+01 5.608e+01, threshold=5.729e+01, percent-clipped=0.0 2024-08-12 00:59:05,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1376220.0, ans=0.1 2024-08-12 00:59:09,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7200, loss[loss=0.1329, beats_loss=0.009955, ecapa_loss=0.0001951, whisper_loss=0.121, over 22111.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0112, ecapa_loss=0.0001825, whisper_loss=0.09141, over 3841806.13 frames. ], batch size: 88, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:59:12,361 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 00:59:13,653 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 00:59:19,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1376320.0, ans=0.125 2024-08-12 00:59:56,859 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 00:59:58,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1376620.0, ans=0.1 2024-08-12 01:00:17,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7250, loss[loss=0.1045, beats_loss=0.01233, ecapa_loss=0.0001554, whisper_loss=0.09059, over 19765.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01124, ecapa_loss=0.0001819, whisper_loss=0.09166, over 3865429.27 frames. ], batch size: 76, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:00:18,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.77 vs. limit=22.5 2024-08-12 01:00:23,699 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 01:00:40,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1376920.0, ans=0.125 2024-08-12 01:00:44,331 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 01:00:47,006 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 01:00:59,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.509e+01 2.818e+01 3.163e+01 4.594e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-12 01:01:02,528 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 01:01:05,440 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 01:01:27,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7300, loss[loss=0.09123, beats_loss=0.01154, ecapa_loss=0.0002166, whisper_loss=0.07753, over 21517.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01115, ecapa_loss=0.0001845, whisper_loss=0.0921, over 3866100.86 frames. ], batch size: 93, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:02:33,412 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 01:02:37,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7350, loss[loss=0.1168, beats_loss=0.008193, ecapa_loss=0.0001781, whisper_loss=0.1068, over 14402.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01115, ecapa_loss=0.0001839, whisper_loss=0.09189, over 3854588.20 frames. ], batch size: 57, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:02:44,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=1377820.0, ans=0.02 2024-08-12 01:02:45,194 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 01:03:04,788 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 01:03:13,570 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 01:03:15,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1378020.0, ans=0.125 2024-08-12 01:03:17,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-12 01:03:18,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.545e+01 2.938e+01 3.274e+01 5.414e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 01:03:20,471 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 01:03:34,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1378220.0, ans=0.125 2024-08-12 01:03:38,523 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 01:03:39,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-12 01:03:44,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1378220.0, ans=0.125 2024-08-12 01:03:46,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7400, loss[loss=0.1049, beats_loss=0.01198, ecapa_loss=0.0001597, whisper_loss=0.09129, over 22723.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01116, ecapa_loss=0.0001836, whisper_loss=0.09197, over 3869813.90 frames. ], batch size: 91, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:03:51,755 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 01:04:04,483 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 01:04:05,829 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-12 01:04:08,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2024-08-12 01:04:10,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1378420.0, ans=0.0 2024-08-12 01:04:10,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.57 vs. limit=22.5 2024-08-12 01:04:13,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1378520.0, ans=0.0 2024-08-12 01:04:13,131 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.072e+00 2024-08-12 01:04:20,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-08-12 01:04:23,641 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 15 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-12 01:04:25,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1378520.0, ans=0.125 2024-08-12 01:04:26,290 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 01:04:32,925 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 01:04:41,269 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 01:04:54,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-12 01:04:54,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7450, loss[loss=0.09148, beats_loss=0.01189, ecapa_loss=0.0002142, whisper_loss=0.07744, over 18402.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01116, ecapa_loss=0.0001852, whisper_loss=0.09171, over 3850463.40 frames. ], batch size: 77, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:05:02,417 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 01:05:13,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1378920.0, ans=0.125 2024-08-12 01:05:14,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1378920.0, ans=0.125 2024-08-12 01:05:36,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.504e+01 2.763e+01 3.240e+01 5.325e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-12 01:05:47,908 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 01:05:59,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379220.0, ans=0.1 2024-08-12 01:05:59,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1379220.0, ans=0.025 2024-08-12 01:06:04,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7500, loss[loss=0.1096, beats_loss=0.01095, ecapa_loss=0.0001934, whisper_loss=0.09671, over 19927.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01115, ecapa_loss=0.0001857, whisper_loss=0.09233, over 3880544.93 frames. ], batch size: 80, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:06:06,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1379320.0, ans=0.125 2024-08-12 01:06:21,074 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 01:06:22,504 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-12 01:06:23,896 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 01:06:25,313 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 01:06:32,720 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 01:06:36,959 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 01:06:43,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379520.0, ans=0.1 2024-08-12 01:06:55,836 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-12 01:06:57,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-08-12 01:07:06,947 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 01:07:16,620 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7550, loss[loss=0.101, beats_loss=0.0115, ecapa_loss=0.0001888, whisper_loss=0.08757, over 21339.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001862, whisper_loss=0.09239, over 3847660.46 frames. ], batch size: 87, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:07:20,920 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 01:07:28,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1379820.0, ans=10.0 2024-08-12 01:07:35,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1379920.0, ans=0.0 2024-08-12 01:07:41,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1379920.0, ans=0.0 2024-08-12 01:07:59,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.522e+01 2.796e+01 3.153e+01 8.804e+01, threshold=5.592e+01, percent-clipped=1.0 2024-08-12 01:08:18,157 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.790e+02 2024-08-12 01:08:19,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1380220.0, ans=0.1 2024-08-12 01:08:24,486 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 01:08:28,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7600, loss[loss=0.1203, beats_loss=0.01035, ecapa_loss=0.0001633, whisper_loss=0.1083, over 22782.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01101, ecapa_loss=0.0001865, whisper_loss=0.09266, over 3822610.41 frames. ], batch size: 88, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:08:51,833 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 01:09:07,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1380520.0, ans=0.125 2024-08-12 01:09:28,519 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 39 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 01:09:33,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1380720.0, ans=0.125 2024-08-12 01:09:42,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2024-08-12 01:09:43,235 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 01:09:44,260 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7650, loss[loss=0.1013, beats_loss=0.01209, ecapa_loss=0.0001906, whisper_loss=0.08728, over 22652.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01101, ecapa_loss=0.0001865, whisper_loss=0.09333, over 3863870.77 frames. ], batch size: 95, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:09:47,947 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 01:09:55,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1380820.0, ans=0.125 2024-08-12 01:10:07,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1380920.0, ans=0.125 2024-08-12 01:10:23,816 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 01:10:25,198 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 13 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 01:10:30,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=15.0 2024-08-12 01:10:30,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.632e+01 2.933e+01 3.294e+01 6.262e+01, threshold=5.865e+01, percent-clipped=1.0 2024-08-12 01:10:34,438 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 01:10:39,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1381120.0, ans=0.2 2024-08-12 01:10:41,499 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:10:44,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1381120.0, ans=0.125 2024-08-12 01:10:49,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1381220.0, ans=0.2 2024-08-12 01:10:55,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-08-12 01:11:02,642 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7700, loss[loss=0.1152, beats_loss=0.008544, ecapa_loss=0.0002124, whisper_loss=0.1046, over 19031.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01096, ecapa_loss=0.000187, whisper_loss=0.09395, over 3878909.06 frames. ], batch size: 73, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:11:26,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1381420.0, ans=0.2 2024-08-12 01:11:32,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-12 01:11:35,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1381520.0, ans=10.0 2024-08-12 01:11:59,644 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 01:12:10,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.83 vs. limit=15.0 2024-08-12 01:12:16,426 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7750, loss[loss=0.09516, beats_loss=0.01051, ecapa_loss=0.000218, whisper_loss=0.08247, over 19024.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01104, ecapa_loss=0.0001878, whisper_loss=0.09293, over 3861618.52 frames. ], batch size: 79, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:12:17,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1381820.0, ans=0.125 2024-08-12 01:12:19,770 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 01:12:20,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2024-08-12 01:12:48,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1382020.0, ans=10.0 2024-08-12 01:12:54,616 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 01:12:54,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1382020.0, ans=0.125 2024-08-12 01:13:00,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.543e+01 2.861e+01 3.273e+01 8.260e+01, threshold=5.723e+01, percent-clipped=1.0 2024-08-12 01:13:04,522 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 01:13:14,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1382220.0, ans=0.125 2024-08-12 01:13:31,319 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7800, loss[loss=0.1047, beats_loss=0.00941, ecapa_loss=0.0001882, whisper_loss=0.0934, over 17141.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01106, ecapa_loss=0.000186, whisper_loss=0.09328, over 3889733.59 frames. ], batch size: 68, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:13:31,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1382320.0, ans=0.125 2024-08-12 01:13:31,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1382320.0, ans=10.0 2024-08-12 01:13:33,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1382320.0, ans=0.125 2024-08-12 01:13:45,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-12 01:14:09,021 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 01:14:10,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1382520.0, ans=0.125 2024-08-12 01:14:21,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-12 01:14:21,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.83 vs. limit=10.0 2024-08-12 01:14:24,998 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-12 01:14:31,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1382720.0, ans=0.2 2024-08-12 01:14:43,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2024-08-12 01:14:45,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7850, loss[loss=0.08868, beats_loss=0.01409, ecapa_loss=0.000195, whisper_loss=0.07265, over 21460.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0001862, whisper_loss=0.09303, over 3909853.75 frames. ], batch size: 94, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:14:52,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1382820.0, ans=0.125 2024-08-12 01:15:21,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1383020.0, ans=0.125 2024-08-12 01:15:25,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1383020.0, ans=0.125 2024-08-12 01:15:29,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.565e+01 2.814e+01 3.165e+01 4.880e+01, threshold=5.628e+01, percent-clipped=0.0 2024-08-12 01:15:44,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1383220.0, ans=0.125 2024-08-12 01:15:53,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1383220.0, ans=0.0 2024-08-12 01:15:58,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7900, loss[loss=0.08494, beats_loss=0.01414, ecapa_loss=0.0001346, whisper_loss=0.06946, over 18094.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01126, ecapa_loss=0.0001851, whisper_loss=0.09178, over 3889122.79 frames. ], batch size: 71, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:15:58,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1383320.0, ans=0.2 2024-08-12 01:16:03,904 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 01:16:13,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-08-12 01:16:24,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1383420.0, ans=0.125 2024-08-12 01:16:24,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1383420.0, ans=0.05 2024-08-12 01:16:36,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1383520.0, ans=0.1 2024-08-12 01:16:38,368 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 01:16:43,877 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 01:16:54,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1383620.0, ans=0.09899494936611666 2024-08-12 01:16:55,573 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 01:16:58,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1383720.0, ans=0.2 2024-08-12 01:16:59,975 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 01:17:03,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1383720.0, ans=0.5 2024-08-12 01:17:10,362 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 01:17:12,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 7950, loss[loss=0.09508, beats_loss=0.01264, ecapa_loss=0.0001876, whisper_loss=0.08056, over 15362.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01125, ecapa_loss=0.0001851, whisper_loss=0.09196, over 3896836.44 frames. ], batch size: 62, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:17:19,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1383820.0, ans=0.0 2024-08-12 01:17:19,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=12.0 2024-08-12 01:17:23,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1383820.0, ans=0.0 2024-08-12 01:17:23,995 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 01:17:24,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1383820.0, ans=0.1 2024-08-12 01:17:28,303 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 01:17:34,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1383920.0, ans=0.125 2024-08-12 01:17:45,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.88 vs. limit=22.5 2024-08-12 01:17:50,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384020.0, ans=0.1 2024-08-12 01:17:57,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.551e+01 2.931e+01 3.391e+01 6.201e+01, threshold=5.862e+01, percent-clipped=1.0 2024-08-12 01:18:08,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1384120.0, ans=0.125 2024-08-12 01:18:11,327 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 01:18:14,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1384220.0, ans=0.2 2024-08-12 01:18:26,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8000, loss[loss=0.1178, beats_loss=0.00889, ecapa_loss=0.0001963, whisper_loss=0.1069, over 21317.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01116, ecapa_loss=0.0001846, whisper_loss=0.09236, over 3890800.67 frames. ], batch size: 86, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:18:37,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1384320.0, ans=0.2 2024-08-12 01:18:49,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1384420.0, ans=0.2 2024-08-12 01:18:56,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384520.0, ans=0.1 2024-08-12 01:18:56,661 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.115e-01 2024-08-12 01:19:01,936 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 01:19:20,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1384620.0, ans=0.125 2024-08-12 01:19:34,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-12 01:19:39,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8050, loss[loss=0.1104, beats_loss=0.0104, ecapa_loss=0.0001797, whisper_loss=0.09822, over 23829.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01108, ecapa_loss=0.0001848, whisper_loss=0.09302, over 3911212.68 frames. ], batch size: 91, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:19:44,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1384820.0, ans=0.125 2024-08-12 01:19:45,433 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 01:19:49,638 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 01:19:57,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1384920.0, ans=0.125 2024-08-12 01:20:22,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1385120.0, ans=0.0 2024-08-12 01:20:22,981 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.542e+01 2.903e+01 3.299e+01 4.788e+01, threshold=5.807e+01, percent-clipped=0.0 2024-08-12 01:20:51,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8100, loss[loss=0.1015, beats_loss=0.01191, ecapa_loss=0.0002614, whisper_loss=0.08702, over 20720.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01116, ecapa_loss=0.0001849, whisper_loss=0.09251, over 3937389.26 frames. ], batch size: 92, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:20:55,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1385320.0, ans=0.025 2024-08-12 01:21:06,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1385420.0, ans=0.1 2024-08-12 01:21:08,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=22.5 2024-08-12 01:21:11,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1385420.0, ans=0.125 2024-08-12 01:21:16,708 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 01:21:17,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1385420.0, ans=0.2 2024-08-12 01:21:24,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-08-12 01:21:30,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2024-08-12 01:21:43,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1385620.0, ans=0.1 2024-08-12 01:21:47,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1385620.0, ans=0.0 2024-08-12 01:21:52,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1385720.0, ans=0.125 2024-08-12 01:21:52,922 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 01:21:53,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1385720.0, ans=0.125 2024-08-12 01:22:04,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8150, loss[loss=0.1026, beats_loss=0.01122, ecapa_loss=0.0001841, whisper_loss=0.08957, over 16244.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01113, ecapa_loss=0.0001858, whisper_loss=0.09235, over 3928212.64 frames. ], batch size: 66, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:22:05,876 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 01:22:19,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1385920.0, ans=0.0 2024-08-12 01:22:29,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1385920.0, ans=0.0 2024-08-12 01:22:29,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1385920.0, ans=0.0 2024-08-12 01:22:36,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1386020.0, ans=0.07 2024-08-12 01:22:47,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.599e+01 2.928e+01 3.345e+01 4.607e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:23:00,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1386120.0, ans=0.125 2024-08-12 01:23:17,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8200, loss[loss=0.09651, beats_loss=0.01267, ecapa_loss=0.0001935, whisper_loss=0.08191, over 15808.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.0001861, whisper_loss=0.0921, over 3922746.08 frames. ], batch size: 65, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:23:20,625 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 15 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 01:23:25,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1386320.0, ans=0.0 2024-08-12 01:23:52,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1386520.0, ans=0.0 2024-08-12 01:24:02,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-08-12 01:24:25,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1386720.0, ans=0.125 2024-08-12 01:24:25,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1386720.0, ans=0.125 2024-08-12 01:24:26,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1386720.0, ans=0.125 2024-08-12 01:24:32,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8250, loss[loss=0.09145, beats_loss=0.01235, ecapa_loss=0.000172, whisper_loss=0.07738, over 14196.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001869, whisper_loss=0.09161, over 3905791.73 frames. ], batch size: 56, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:24:38,165 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 01:24:45,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1386920.0, ans=0.125 2024-08-12 01:24:51,263 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 01:24:51,565 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:24:52,718 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 01:25:16,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.606e+01 2.891e+01 3.345e+01 5.457e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-12 01:25:34,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1387220.0, ans=0.125 2024-08-12 01:25:46,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8300, loss[loss=0.1065, beats_loss=0.01104, ecapa_loss=0.0001677, whisper_loss=0.09374, over 22573.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0111, ecapa_loss=0.0001843, whisper_loss=0.09201, over 3915163.22 frames. ], batch size: 89, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:25:52,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-12 01:26:05,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1387420.0, ans=0.1 2024-08-12 01:26:21,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-08-12 01:26:22,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1387520.0, ans=0.125 2024-08-12 01:26:32,678 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 01:26:38,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1387620.0, ans=0.2 2024-08-12 01:26:40,707 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 01:26:42,166 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 01:26:56,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1387720.0, ans=0.125 2024-08-12 01:27:02,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8350, loss[loss=0.1275, beats_loss=0.009352, ecapa_loss=0.0001365, whisper_loss=0.1168, over 21771.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01118, ecapa_loss=0.000184, whisper_loss=0.09142, over 3952225.75 frames. ], batch size: 77, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:27:05,810 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 01:27:15,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1387920.0, ans=0.0 2024-08-12 01:27:16,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.71 vs. limit=10.0 2024-08-12 01:27:18,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.26 vs. limit=22.5 2024-08-12 01:27:23,713 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 01:27:37,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1388020.0, ans=0.025 2024-08-12 01:27:38,409 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 01:27:41,560 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 01:27:47,174 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.752e+01 3.106e+01 3.684e+01 1.573e+02, threshold=6.213e+01, percent-clipped=3.0 2024-08-12 01:27:55,048 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 01:28:16,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8400, loss[loss=0.1123, beats_loss=0.008395, ecapa_loss=0.0001889, whisper_loss=0.102, over 16058.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0111, ecapa_loss=0.0001841, whisper_loss=0.09224, over 3951567.53 frames. ], batch size: 63, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:28:19,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1388320.0, ans=0.125 2024-08-12 01:29:13,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-08-12 01:29:21,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1388720.0, ans=0.0 2024-08-12 01:29:25,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-08-12 01:29:29,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8450, loss[loss=0.1097, beats_loss=0.0103, ecapa_loss=0.0002228, whisper_loss=0.09722, over 14697.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001858, whisper_loss=0.09224, over 3923896.80 frames. ], batch size: 60, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:29:43,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1388920.0, ans=0.125 2024-08-12 01:30:03,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1389020.0, ans=0.05 2024-08-12 01:30:11,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1389120.0, ans=0.0 2024-08-12 01:30:12,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.661e+01 3.023e+01 3.413e+01 6.376e+01, threshold=6.047e+01, percent-clipped=1.0 2024-08-12 01:30:40,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8500, loss[loss=0.1072, beats_loss=0.009936, ecapa_loss=0.0001962, whisper_loss=0.09529, over 20296.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01118, ecapa_loss=0.0001846, whisper_loss=0.09155, over 3929433.55 frames. ], batch size: 80, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:30:41,819 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 01:30:46,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1389320.0, ans=0.125 2024-08-12 01:30:47,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2024-08-12 01:30:50,701 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 01:30:52,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1389320.0, ans=0.0 2024-08-12 01:30:54,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1389420.0, ans=0.1 2024-08-12 01:30:56,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2024-08-12 01:30:58,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1389420.0, ans=0.0 2024-08-12 01:31:11,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1389520.0, ans=0.125 2024-08-12 01:31:35,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1389620.0, ans=0.2 2024-08-12 01:31:42,235 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 01:31:50,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-08-12 01:31:52,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8550, loss[loss=0.108, beats_loss=0.01231, ecapa_loss=0.000163, whisper_loss=0.09408, over 22646.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01119, ecapa_loss=0.0001849, whisper_loss=0.09176, over 3917297.52 frames. ], batch size: 92, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:32:04,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1389820.0, ans=0.125 2024-08-12 01:32:37,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.566e+01 2.875e+01 3.249e+01 7.628e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 01:32:47,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1390120.0, ans=0.125 2024-08-12 01:32:48,633 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 01:33:03,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1390320.0, ans=0.0 2024-08-12 01:33:03,805 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8600, loss[loss=0.09477, beats_loss=0.01144, ecapa_loss=0.0001543, whisper_loss=0.08178, over 19763.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01117, ecapa_loss=0.0001858, whisper_loss=0.09159, over 3906385.48 frames. ], batch size: 77, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:33:16,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1390320.0, ans=0.125 2024-08-12 01:33:21,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1390420.0, ans=0.125 2024-08-12 01:33:27,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1390420.0, ans=0.0 2024-08-12 01:33:37,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1390520.0, ans=0.1 2024-08-12 01:33:50,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2024-08-12 01:34:05,416 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.463e-02 2024-08-12 01:34:10,354 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 01:34:14,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8650, loss[loss=0.0877, beats_loss=0.01252, ecapa_loss=0.0001532, whisper_loss=0.07365, over 17340.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01117, ecapa_loss=0.0001847, whisper_loss=0.09206, over 3894964.67 frames. ], batch size: 69, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:34:35,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1390920.0, ans=0.2 2024-08-12 01:34:40,501 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 01:34:57,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.624e+01 3.118e+01 3.764e+01 6.887e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-12 01:35:06,313 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-12 01:35:10,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1391220.0, ans=0.125 2024-08-12 01:35:15,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1391220.0, ans=0.125 2024-08-12 01:35:25,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8700, loss[loss=0.08729, beats_loss=0.01328, ecapa_loss=0.0001521, whisper_loss=0.0725, over 20946.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01122, ecapa_loss=0.0001848, whisper_loss=0.09156, over 3877288.88 frames. ], batch size: 82, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:35:27,118 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 01:35:39,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1391420.0, ans=0.04949747468305833 2024-08-12 01:35:42,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1391420.0, ans=0.125 2024-08-12 01:35:45,363 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 01:36:00,599 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 01:36:11,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1391620.0, ans=0.125 2024-08-12 01:36:29,409 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 01:36:39,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8750, loss[loss=0.09669, beats_loss=0.013, ecapa_loss=0.0001402, whisper_loss=0.08229, over 17125.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0113, ecapa_loss=0.000184, whisper_loss=0.09105, over 3885136.89 frames. ], batch size: 64, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:36:41,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1391820.0, ans=0.05 2024-08-12 01:37:13,857 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 01:37:25,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.651e+01 2.928e+01 3.365e+01 6.201e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:37:54,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8800, loss[loss=0.09785, beats_loss=0.01207, ecapa_loss=0.0001775, whisper_loss=0.084, over 22118.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01128, ecapa_loss=0.0001836, whisper_loss=0.09178, over 3901959.64 frames. ], batch size: 88, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:38:09,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1392420.0, ans=0.0 2024-08-12 01:38:22,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=8.0 2024-08-12 01:38:49,408 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 01:38:55,723 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 01:39:07,670 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 01:39:08,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8850, loss[loss=0.1116, beats_loss=0.009137, ecapa_loss=0.000179, whisper_loss=0.1007, over 23909.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01135, ecapa_loss=0.0001825, whisper_loss=0.09165, over 3931145.82 frames. ], batch size: 93, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:39:10,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1392820.0, ans=0.125 2024-08-12 01:39:18,550 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 01:39:32,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1392920.0, ans=0.2 2024-08-12 01:39:37,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1393020.0, ans=0.1 2024-08-12 01:39:40,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.89 vs. limit=22.5 2024-08-12 01:39:42,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1393020.0, ans=0.125 2024-08-12 01:39:43,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1393020.0, ans=0.125 2024-08-12 01:39:53,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.605e+01 2.898e+01 3.315e+01 6.590e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-12 01:39:57,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.98 vs. limit=6.0 2024-08-12 01:39:59,215 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 01:39:59,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1393120.0, ans=0.2 2024-08-12 01:40:12,000 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 01:40:20,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8900, loss[loss=0.1193, beats_loss=0.01094, ecapa_loss=0.0001683, whisper_loss=0.1066, over 19389.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01135, ecapa_loss=0.0001825, whisper_loss=0.09152, over 3886844.62 frames. ], batch size: 74, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:40:20,668 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 01:40:27,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1393320.0, ans=0.0 2024-08-12 01:40:36,289 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 01:40:41,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1393420.0, ans=10.0 2024-08-12 01:40:54,326 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 01:41:07,003 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 01:41:07,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1393620.0, ans=0.125 2024-08-12 01:41:31,004 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 8950, loss[loss=0.1216, beats_loss=0.007561, ecapa_loss=0.0001954, whisper_loss=0.1121, over 15371.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01135, ecapa_loss=0.0001826, whisper_loss=0.09196, over 3901087.19 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:41:45,997 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 01:41:50,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1393920.0, ans=0.125 2024-08-12 01:42:03,589 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 01:42:03,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1394020.0, ans=0.125 2024-08-12 01:42:06,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1394020.0, ans=0.0 2024-08-12 01:42:13,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.694e+01 3.111e+01 3.699e+01 1.037e+02, threshold=6.222e+01, percent-clipped=1.0 2024-08-12 01:42:19,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2024-08-12 01:42:28,629 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 01:42:30,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1394220.0, ans=0.125 2024-08-12 01:42:33,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1394220.0, ans=0.1 2024-08-12 01:42:38,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9000, loss[loss=0.08768, beats_loss=0.01181, ecapa_loss=0.0001951, whisper_loss=0.07392, over 19944.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01133, ecapa_loss=0.0001835, whisper_loss=0.09216, over 3903151.65 frames. ], batch size: 84, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:42:38,985 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 01:43:16,665 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006076, whisper_loss=0.2507, over 922467.00 frames. 2024-08-12 01:43:34,709 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on SV_voxceleb1: loss=0.005114, beats_loss=0, ecapa_loss=0.0005114, whisper_loss=0, over 939242.00 frames. 2024-08-12 01:45:19,177 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 01:45:19,187 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 01:45:22,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1394320.0, ans=0.015 2024-08-12 01:45:41,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2024-08-12 01:45:46,934 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-12 01:45:58,310 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:46:11,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2024-08-12 01:46:15,710 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.653e-02 2024-08-12 01:46:17,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1394720.0, ans=0.0 2024-08-12 01:46:25,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1394720.0, ans=0.0 2024-08-12 01:46:26,426 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 01:46:28,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9050, loss[loss=0.09137, beats_loss=0.01108, ecapa_loss=0.0002722, whisper_loss=0.07757, over 19532.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01125, ecapa_loss=0.0001841, whisper_loss=0.09219, over 3889730.92 frames. ], batch size: 90, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:46:29,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1394820.0, ans=0.125 2024-08-12 01:46:39,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1394820.0, ans=0.125 2024-08-12 01:46:45,892 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-12 01:46:46,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1394920.0, ans=0.2 2024-08-12 01:46:51,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1394920.0, ans=0.0 2024-08-12 01:46:59,779 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 01:47:11,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.578e+01 2.935e+01 3.281e+01 5.128e+01, threshold=5.870e+01, percent-clipped=0.0 2024-08-12 01:47:13,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1395120.0, ans=0.2 2024-08-12 01:47:14,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.41 vs. limit=15.0 2024-08-12 01:47:25,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2024-08-12 01:47:29,737 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 01:47:37,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9100, loss[loss=0.1169, beats_loss=0.009686, ecapa_loss=0.000243, whisper_loss=0.1048, over 18147.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01113, ecapa_loss=0.0001849, whisper_loss=0.09312, over 3875857.53 frames. ], batch size: 76, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:47:50,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1395420.0, ans=0.09899494936611666 2024-08-12 01:48:05,007 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-12 01:48:18,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1395620.0, ans=0.1 2024-08-12 01:48:18,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1395620.0, ans=0.1 2024-08-12 01:48:27,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1395620.0, ans=0.1 2024-08-12 01:48:27,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1395620.0, ans=0.125 2024-08-12 01:48:28,621 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 01:48:31,222 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 01:48:31,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.25 vs. limit=10.0 2024-08-12 01:48:37,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1395720.0, ans=0.05 2024-08-12 01:48:45,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9150, loss[loss=0.09855, beats_loss=0.01208, ecapa_loss=0.0001882, whisper_loss=0.08459, over 22096.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01108, ecapa_loss=0.0001865, whisper_loss=0.09334, over 3915466.68 frames. ], batch size: 94, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:48:47,307 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 01:48:47,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1395820.0, ans=0.125 2024-08-12 01:48:48,683 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 01:48:55,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1395820.0, ans=0.0 2024-08-12 01:49:12,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1396020.0, ans=0.0 2024-08-12 01:49:22,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1396020.0, ans=0.125 2024-08-12 01:49:25,926 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 01:49:28,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.582e+01 2.877e+01 3.376e+01 5.392e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-12 01:49:33,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1396120.0, ans=0.125 2024-08-12 01:49:39,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1396220.0, ans=0.2 2024-08-12 01:49:53,983 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9200, loss[loss=0.1254, beats_loss=0.008939, ecapa_loss=0.0002239, whisper_loss=0.1143, over 20912.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01115, ecapa_loss=0.000186, whisper_loss=0.093, over 3936992.48 frames. ], batch size: 83, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:50:26,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1396520.0, ans=0.0 2024-08-12 01:50:45,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1396620.0, ans=0.125 2024-08-12 01:50:52,112 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-12 01:50:53,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1396720.0, ans=0.2 2024-08-12 01:51:02,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9250, loss[loss=0.105, beats_loss=0.009497, ecapa_loss=0.0002159, whisper_loss=0.0933, over 22473.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.0001859, whisper_loss=0.09267, over 3903158.69 frames. ], batch size: 91, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:51:05,417 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-12 01:51:29,454 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 01:51:33,351 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-12 01:51:40,251 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-12 01:51:44,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.700e+01 2.936e+01 3.310e+01 8.820e+01, threshold=5.872e+01, percent-clipped=1.0 2024-08-12 01:51:47,412 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 01:51:49,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1397120.0, ans=0.0 2024-08-12 01:51:52,979 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 01:52:00,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1397220.0, ans=0.125 2024-08-12 01:52:09,116 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 01:52:10,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9300, loss[loss=0.1015, beats_loss=0.01195, ecapa_loss=0.0001845, whisper_loss=0.08774, over 17438.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001854, whisper_loss=0.09273, over 3926924.10 frames. ], batch size: 69, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:52:35,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1397420.0, ans=0.125 2024-08-12 01:52:35,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1397420.0, ans=0.0 2024-08-12 01:52:38,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1397520.0, ans=0.0 2024-08-12 01:52:42,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1397520.0, ans=0.1 2024-08-12 01:53:18,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1397820.0, ans=0.0 2024-08-12 01:53:19,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9350, loss[loss=0.08468, beats_loss=0.0111, ecapa_loss=0.0002459, whisper_loss=0.07112, over 15146.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.0001858, whisper_loss=0.0932, over 3899570.33 frames. ], batch size: 64, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:53:30,747 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 40 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-12 01:53:31,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1397820.0, ans=0.0 2024-08-12 01:53:50,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1398020.0, ans=0.0 2024-08-12 01:53:54,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1398020.0, ans=0.0 2024-08-12 01:54:02,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.050e+01 2.487e+01 2.851e+01 3.233e+01 4.318e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-12 01:54:04,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1398120.0, ans=0.0 2024-08-12 01:54:13,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1398120.0, ans=0.125 2024-08-12 01:54:20,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.69 vs. limit=10.0 2024-08-12 01:54:21,316 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 01:54:29,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9400, loss[loss=0.1118, beats_loss=0.009853, ecapa_loss=0.0001684, whisper_loss=0.1002, over 21424.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.0001856, whisper_loss=0.09299, over 3909239.95 frames. ], batch size: 80, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:54:38,741 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 01:55:00,957 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 01:55:06,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1398520.0, ans=0.125 2024-08-12 01:55:38,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9450, loss[loss=0.1141, beats_loss=0.01127, ecapa_loss=0.000199, whisper_loss=0.1009, over 22350.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01107, ecapa_loss=0.0001875, whisper_loss=0.09308, over 3883294.06 frames. ], batch size: 91, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:56:10,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.42 vs. limit=22.5 2024-08-12 01:56:13,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2024-08-12 01:56:19,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1399120.0, ans=0.125 2024-08-12 01:56:20,509 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.626e+01 2.954e+01 3.375e+01 5.231e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-12 01:56:21,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1399120.0, ans=0.125 2024-08-12 01:56:22,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1399120.0, ans=0.125 2024-08-12 01:56:32,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1399220.0, ans=0.125 2024-08-12 01:56:42,803 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-12 01:56:46,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9500, loss[loss=0.1002, beats_loss=0.01346, ecapa_loss=0.0001603, whisper_loss=0.08517, over 19612.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001871, whisper_loss=0.09247, over 3882938.26 frames. ], batch size: 79, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:56:55,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-12 01:57:06,263 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 01:57:06,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1399420.0, ans=0.1 2024-08-12 01:57:10,892 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 01:57:18,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2024-08-12 01:57:27,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.25 vs. limit=15.0 2024-08-12 01:57:47,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1399720.0, ans=0.125 2024-08-12 01:57:52,132 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 01:57:56,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9550, loss[loss=0.1024, beats_loss=0.01246, ecapa_loss=0.0001753, whisper_loss=0.0882, over 19964.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01117, ecapa_loss=0.0001873, whisper_loss=0.09183, over 3903829.77 frames. ], batch size: 83, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:58:32,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1400020.0, ans=0.1 2024-08-12 01:58:34,158 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 01:58:40,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.623e+01 2.882e+01 3.186e+01 4.825e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 01:58:50,920 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-12 01:58:52,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1400220.0, ans=0.2 2024-08-12 01:58:53,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1400220.0, ans=0.0 2024-08-12 01:59:05,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1400320.0, ans=0.0 2024-08-12 01:59:06,737 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9600, loss[loss=0.1061, beats_loss=0.00989, ecapa_loss=0.0002086, whisper_loss=0.09408, over 21899.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01111, ecapa_loss=0.0001864, whisper_loss=0.09269, over 3910803.11 frames. ], batch size: 89, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:59:21,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400420.0, ans=0.1 2024-08-12 01:59:28,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1400420.0, ans=0.125 2024-08-12 01:59:34,186 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 01:59:51,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1400620.0, ans=0.0 2024-08-12 01:59:54,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-12 02:00:16,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9650, loss[loss=0.1065, beats_loss=0.01109, ecapa_loss=0.000189, whisper_loss=0.09348, over 23662.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01107, ecapa_loss=0.0001874, whisper_loss=0.0927, over 3880019.24 frames. ], batch size: 94, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:00:37,452 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 02:01:00,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.704e+01 3.034e+01 3.483e+01 7.919e+01, threshold=6.068e+01, percent-clipped=1.0 2024-08-12 02:01:04,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1401120.0, ans=0.125 2024-08-12 02:01:09,659 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 02:01:16,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2024-08-12 02:01:26,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9700, loss[loss=0.1234, beats_loss=0.01064, ecapa_loss=0.0001709, whisper_loss=0.111, over 22996.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001874, whisper_loss=0.09281, over 3891944.69 frames. ], batch size: 89, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:01:28,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2024-08-12 02:01:44,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1401420.0, ans=0.0 2024-08-12 02:01:56,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1401520.0, ans=0.0 2024-08-12 02:01:58,691 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 02:02:01,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.89 vs. limit=10.0 2024-08-12 02:02:28,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.52 vs. limit=10.0 2024-08-12 02:02:36,890 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9750, loss[loss=0.1038, beats_loss=0.01029, ecapa_loss=0.0001811, whisper_loss=0.0917, over 18940.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01103, ecapa_loss=0.0001859, whisper_loss=0.09286, over 3886767.18 frames. ], batch size: 79, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:03:07,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1402020.0, ans=0.0 2024-08-12 02:03:11,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-12 02:03:14,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1402020.0, ans=0.025 2024-08-12 02:03:20,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.664e+01 3.101e+01 3.565e+01 5.192e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-12 02:03:42,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1402220.0, ans=0.05 2024-08-12 02:03:45,080 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 02:03:45,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1402220.0, ans=0.2 2024-08-12 02:03:46,601 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-12 02:03:47,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1402320.0, ans=0.2 2024-08-12 02:03:47,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9800, loss[loss=0.1264, beats_loss=0.011, ecapa_loss=0.0001658, whisper_loss=0.1138, over 20997.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01106, ecapa_loss=0.0001854, whisper_loss=0.09285, over 3880104.64 frames. ], batch size: 81, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:03:54,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1402320.0, ans=0.125 2024-08-12 02:04:07,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1402420.0, ans=0.0 2024-08-12 02:04:49,531 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 02:04:54,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1402720.0, ans=0.0 2024-08-12 02:04:54,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2024-08-12 02:04:58,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9850, loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.0001773, whisper_loss=0.09295, over 23125.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0111, ecapa_loss=0.0001854, whisper_loss=0.09287, over 3870723.79 frames. ], batch size: 94, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:05:00,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1402820.0, ans=0.125 2024-08-12 02:05:10,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1402820.0, ans=0.125 2024-08-12 02:05:21,741 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.323e+00 2024-08-12 02:05:42,036 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.518e+01 2.832e+01 3.271e+01 6.017e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 02:05:55,801 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 02:06:01,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.45 vs. limit=22.5 2024-08-12 02:06:09,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9900, loss[loss=0.1102, beats_loss=0.01203, ecapa_loss=0.0001884, whisper_loss=0.09633, over 21891.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001846, whisper_loss=0.09329, over 3875958.33 frames. ], batch size: 92, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:06:39,691 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 02:06:39,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1403520.0, ans=0.2 2024-08-12 02:06:47,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2024-08-12 02:06:48,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1403520.0, ans=0.2 2024-08-12 02:07:07,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1403720.0, ans=0.0 2024-08-12 02:07:11,450 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 02:07:12,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1403720.0, ans=0.0 2024-08-12 02:07:20,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 9950, loss[loss=0.1185, beats_loss=0.009975, ecapa_loss=0.0002189, whisper_loss=0.1063, over 14260.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01117, ecapa_loss=0.0001839, whisper_loss=0.0926, over 3855708.54 frames. ], batch size: 59, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:07:30,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1403820.0, ans=0.0 2024-08-12 02:07:44,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1403920.0, ans=0.125 2024-08-12 02:07:57,036 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 02:08:03,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.549e+01 2.857e+01 3.293e+01 8.751e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 02:08:17,474 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 02:08:19,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2024-08-12 02:08:29,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10000, loss[loss=0.09114, beats_loss=0.01366, ecapa_loss=0.0001548, whisper_loss=0.07593, over 23551.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0112, ecapa_loss=0.0001846, whisper_loss=0.09212, over 3873209.55 frames. ], batch size: 93, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:08:42,026 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:09:18,009 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 02:09:22,270 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 02:09:25,521 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 02:09:37,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1404720.0, ans=0.0 2024-08-12 02:09:44,467 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10050, loss[loss=0.09241, beats_loss=0.01233, ecapa_loss=0.0001674, whisper_loss=0.0784, over 22324.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01111, ecapa_loss=0.000186, whisper_loss=0.09232, over 3856608.69 frames. ], batch size: 93, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:09:47,421 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-12 02:10:00,670 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.053e-02 2024-08-12 02:10:12,311 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 02:10:12,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405020.0, ans=0.1 2024-08-12 02:10:18,727 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 02:10:30,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1405120.0, ans=0.0 2024-08-12 02:10:30,645 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.648e+01 2.983e+01 3.418e+01 4.523e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 02:10:42,533 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-12 02:10:44,358 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 02:10:48,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-12 02:11:02,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1405320.0, ans=0.2 2024-08-12 02:11:02,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10100, loss[loss=0.08897, beats_loss=0.01432, ecapa_loss=0.0001696, whisper_loss=0.07296, over 18495.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01116, ecapa_loss=0.0001847, whisper_loss=0.09204, over 3891942.57 frames. ], batch size: 77, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:11:10,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=12.0 2024-08-12 02:11:13,160 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 02:11:26,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1405420.0, ans=0.0 2024-08-12 02:11:46,103 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 02:11:49,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1405520.0, ans=0.125 2024-08-12 02:12:11,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1405720.0, ans=0.125 2024-08-12 02:12:13,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1405720.0, ans=0.0 2024-08-12 02:12:18,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1405720.0, ans=0.0 2024-08-12 02:12:19,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1405720.0, ans=0.0 2024-08-12 02:12:23,504 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 02:12:27,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10150, loss[loss=0.1107, beats_loss=0.01073, ecapa_loss=0.0002207, whisper_loss=0.09776, over 15486.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01113, ecapa_loss=0.0001862, whisper_loss=0.09208, over 3876634.88 frames. ], batch size: 64, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:13:12,655 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 02:13:15,894 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 02:13:23,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.579e+01 2.918e+01 3.241e+01 4.906e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-12 02:13:36,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1406120.0, ans=0.0 2024-08-12 02:13:46,699 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-12 02:13:50,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1406220.0, ans=0.0 2024-08-12 02:14:04,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1406220.0, ans=0.125 2024-08-12 02:14:07,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1406320.0, ans=0.0 2024-08-12 02:14:07,867 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10200, loss[loss=0.08928, beats_loss=0.01289, ecapa_loss=0.0002512, whisper_loss=0.07387, over 19584.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01109, ecapa_loss=0.0001875, whisper_loss=0.09215, over 3890369.44 frames. ], batch size: 88, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:14:19,722 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 02:14:30,553 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 02:14:40,789 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 02:14:49,825 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 02:14:53,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1406520.0, ans=0.2 2024-08-12 02:14:54,233 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 02:15:08,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1406520.0, ans=0.0 2024-08-12 02:15:16,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.93 vs. limit=10.0 2024-08-12 02:15:21,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1406620.0, ans=0.1 2024-08-12 02:15:28,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.41 vs. limit=22.5 2024-08-12 02:15:43,225 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 02:15:44,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1406720.0, ans=0.125 2024-08-12 02:16:01,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10250, loss[loss=0.1124, beats_loss=0.01224, ecapa_loss=0.0001627, whisper_loss=0.09851, over 23061.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01107, ecapa_loss=0.0001881, whisper_loss=0.09187, over 3878302.72 frames. ], batch size: 91, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:16:07,656 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 10 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 02:16:38,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=12.0 2024-08-12 02:17:03,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1407120.0, ans=0.1 2024-08-12 02:17:04,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.647e+01 2.891e+01 3.478e+01 5.936e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-12 02:17:11,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1407120.0, ans=0.0 2024-08-12 02:17:34,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1407220.0, ans=0.1 2024-08-12 02:17:43,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10300, loss[loss=0.1389, beats_loss=0.009007, ecapa_loss=0.0002052, whisper_loss=0.1278, over 23102.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.0001877, whisper_loss=0.0916, over 3891673.00 frames. ], batch size: 90, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:17:54,620 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 02:18:27,726 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 02:18:31,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1407520.0, ans=0.125 2024-08-12 02:18:38,420 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 02:19:10,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1407820.0, ans=0.0 2024-08-12 02:19:11,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10350, loss[loss=0.1093, beats_loss=0.009763, ecapa_loss=0.0002081, whisper_loss=0.09746, over 21651.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01118, ecapa_loss=0.0001868, whisper_loss=0.09147, over 3914180.65 frames. ], batch size: 91, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:19:21,884 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 02:19:45,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1408020.0, ans=0.2 2024-08-12 02:19:45,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-12 02:19:56,149 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.600e+01 2.842e+01 3.107e+01 4.520e+01, threshold=5.684e+01, percent-clipped=0.0 2024-08-12 02:20:23,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1408220.0, ans=0.125 2024-08-12 02:20:25,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10400, loss[loss=0.09406, beats_loss=0.01433, ecapa_loss=0.0001599, whisper_loss=0.07813, over 23506.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01117, ecapa_loss=0.0001863, whisper_loss=0.09173, over 3895512.68 frames. ], batch size: 92, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:20:29,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-12 02:20:41,756 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 02:20:43,066 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 02:20:46,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1408420.0, ans=0.0 2024-08-12 02:21:05,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1408520.0, ans=0.0 2024-08-12 02:21:05,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1408520.0, ans=0.025 2024-08-12 02:21:21,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1408620.0, ans=0.125 2024-08-12 02:21:37,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10450, loss[loss=0.1028, beats_loss=0.01112, ecapa_loss=0.0001818, whisper_loss=0.08982, over 20795.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01114, ecapa_loss=0.0001858, whisper_loss=0.09167, over 3866612.99 frames. ], batch size: 85, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:21:49,794 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 02:21:52,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-12 02:21:54,150 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 02:21:54,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1408920.0, ans=0.125 2024-08-12 02:22:01,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1408920.0, ans=0.2 2024-08-12 02:22:05,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-12 02:22:06,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1409020.0, ans=0.1 2024-08-12 02:22:15,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=15.0 2024-08-12 02:22:20,628 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 02:22:21,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.627e+01 2.925e+01 3.348e+01 4.455e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 02:22:23,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1409120.0, ans=0.125 2024-08-12 02:22:24,736 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 02:22:27,837 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 02:22:36,126 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 02:22:49,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10500, loss[loss=0.09654, beats_loss=0.01246, ecapa_loss=0.0001877, whisper_loss=0.08221, over 22909.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01112, ecapa_loss=0.0001862, whisper_loss=0.0918, over 3895530.46 frames. ], batch size: 92, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:22:54,124 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 02:23:01,729 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.336e+01 2024-08-12 02:23:03,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1409420.0, ans=0.125 2024-08-12 02:23:04,406 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 02:23:06,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1409420.0, ans=0.1 2024-08-12 02:23:09,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1409420.0, ans=0.125 2024-08-12 02:23:44,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1409620.0, ans=0.09899494936611666 2024-08-12 02:23:44,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1409620.0, ans=0.09899494936611666 2024-08-12 02:24:02,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10550, loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001973, whisper_loss=0.09039, over 15633.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01109, ecapa_loss=0.0001873, whisper_loss=0.09171, over 3868881.31 frames. ], batch size: 63, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:24:07,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1409820.0, ans=0.0 2024-08-12 02:24:24,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1409920.0, ans=0.0 2024-08-12 02:24:25,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1409920.0, ans=0.0 2024-08-12 02:24:32,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1410020.0, ans=0.125 2024-08-12 02:24:35,367 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 02:24:46,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.599e+01 2.845e+01 3.296e+01 6.744e+01, threshold=5.691e+01, percent-clipped=1.0 2024-08-12 02:24:52,284 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 02:25:00,793 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 02:25:07,615 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 02:25:12,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1410320.0, ans=0.125 2024-08-12 02:25:13,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10600, loss[loss=0.1045, beats_loss=0.01153, ecapa_loss=0.0001835, whisper_loss=0.09109, over 22455.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01114, ecapa_loss=0.0001854, whisper_loss=0.09193, over 3863668.04 frames. ], batch size: 92, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:25:39,939 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-12 02:25:43,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1410520.0, ans=0.02 2024-08-12 02:25:45,491 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 02:25:48,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1410520.0, ans=0.125 2024-08-12 02:26:11,872 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 02:26:22,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10650, loss[loss=0.1339, beats_loss=0.009509, ecapa_loss=0.0002051, whisper_loss=0.1224, over 22437.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01107, ecapa_loss=0.0001849, whisper_loss=0.09263, over 3863750.68 frames. ], batch size: 90, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:26:28,043 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 02:26:30,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1410820.0, ans=0.125 2024-08-12 02:26:33,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1410820.0, ans=0.0 2024-08-12 02:27:00,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1411020.0, ans=0.125 2024-08-12 02:27:04,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.646e+01 2.959e+01 3.392e+01 4.637e+01, threshold=5.918e+01, percent-clipped=0.0 2024-08-12 02:27:13,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2024-08-12 02:27:21,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1411220.0, ans=0.125 2024-08-12 02:27:29,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1411320.0, ans=0.125 2024-08-12 02:27:29,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1411320.0, ans=0.125 2024-08-12 02:27:30,816 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10700, loss[loss=0.09679, beats_loss=0.01156, ecapa_loss=0.0002183, whisper_loss=0.08305, over 18897.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01118, ecapa_loss=0.0001838, whisper_loss=0.09242, over 3863202.69 frames. ], batch size: 78, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:27:45,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1411420.0, ans=0.125 2024-08-12 02:27:55,001 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 02:28:14,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1411620.0, ans=0.125 2024-08-12 02:28:25,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1411720.0, ans=0.125 2024-08-12 02:28:26,342 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 02:28:34,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2024-08-12 02:28:40,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10750, loss[loss=0.1052, beats_loss=0.01131, ecapa_loss=0.0002471, whisper_loss=0.09146, over 21142.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01109, ecapa_loss=0.0001846, whisper_loss=0.09319, over 3888566.40 frames. ], batch size: 91, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:28:49,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1411820.0, ans=0.125 2024-08-12 02:28:55,418 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 02:29:03,484 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 02:29:07,481 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 02:29:22,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.596e+01 2.921e+01 3.440e+01 9.548e+01, threshold=5.843e+01, percent-clipped=1.0 2024-08-12 02:29:27,122 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 02:29:31,194 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 02:29:32,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=12.0 2024-08-12 02:29:38,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1412220.0, ans=0.125 2024-08-12 02:29:48,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10800, loss[loss=0.08386, beats_loss=0.011, ecapa_loss=0.0002564, whisper_loss=0.0703, over 15044.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0111, ecapa_loss=0.0001849, whisper_loss=0.09265, over 3868894.72 frames. ], batch size: 68, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:29:50,298 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 02:29:52,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1412320.0, ans=0.125 2024-08-12 02:29:52,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2024-08-12 02:29:56,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.81 vs. limit=10.0 2024-08-12 02:30:05,424 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 17 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 02:30:34,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1412620.0, ans=0.125 2024-08-12 02:30:47,082 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 02:30:48,410 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 02:30:56,338 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10850, loss[loss=0.1148, beats_loss=0.01044, ecapa_loss=0.0001655, whisper_loss=0.1027, over 17310.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0111, ecapa_loss=0.0001853, whisper_loss=0.09333, over 3907871.70 frames. ], batch size: 67, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:31:05,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1412820.0, ans=0.2 2024-08-12 02:31:13,056 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 02:31:15,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1412920.0, ans=0.0 2024-08-12 02:31:16,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-12 02:31:24,072 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 02:31:39,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.708e+01 3.088e+01 3.544e+01 8.247e+01, threshold=6.177e+01, percent-clipped=2.0 2024-08-12 02:31:55,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1413220.0, ans=0.0 2024-08-12 02:32:02,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1413220.0, ans=0.0 2024-08-12 02:32:06,816 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10900, loss[loss=0.1184, beats_loss=0.01083, ecapa_loss=0.0001646, whisper_loss=0.1059, over 20432.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01115, ecapa_loss=0.000185, whisper_loss=0.09362, over 3954160.57 frames. ], batch size: 80, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:32:11,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.80 vs. limit=22.5 2024-08-12 02:32:17,548 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 02:32:24,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.77 vs. limit=15.0 2024-08-12 02:32:24,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1413420.0, ans=0.125 2024-08-12 02:32:27,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1413420.0, ans=0.125 2024-08-12 02:32:50,617 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 02:33:08,729 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 02:33:18,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 10950, loss[loss=0.0921, beats_loss=0.01011, ecapa_loss=0.000171, whisper_loss=0.08028, over 14584.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01112, ecapa_loss=0.0001845, whisper_loss=0.09361, over 3958188.88 frames. ], batch size: 54, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:33:28,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1413820.0, ans=0.0 2024-08-12 02:33:28,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1413820.0, ans=0.125 2024-08-12 02:33:59,814 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 02:34:00,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.632e+01 3.025e+01 3.424e+01 7.059e+01, threshold=6.051e+01, percent-clipped=1.0 2024-08-12 02:34:02,696 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 02:34:05,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1414120.0, ans=0.015 2024-08-12 02:34:11,135 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 02:34:12,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1414220.0, ans=0.1 2024-08-12 02:34:27,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11000, loss[loss=0.1062, beats_loss=0.009335, ecapa_loss=0.0002034, whisper_loss=0.09486, over 22547.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01103, ecapa_loss=0.000186, whisper_loss=0.09426, over 3971974.22 frames. ], batch size: 89, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:34:34,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-12 02:34:48,331 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 02:34:55,048 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 02:34:55,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1414520.0, ans=0.0 2024-08-12 02:34:55,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1414520.0, ans=0.125 2024-08-12 02:35:05,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1414520.0, ans=0.125 2024-08-12 02:35:24,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1414720.0, ans=0.0 2024-08-12 02:35:29,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1414720.0, ans=0.0 2024-08-12 02:35:30,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1414720.0, ans=0.0 2024-08-12 02:35:35,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11050, loss[loss=0.09609, beats_loss=0.01303, ecapa_loss=0.0001647, whisper_loss=0.08142, over 20797.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01106, ecapa_loss=0.0001853, whisper_loss=0.09356, over 3967610.20 frames. ], batch size: 83, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:35:38,489 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 02:35:46,583 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 02:35:49,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1414920.0, ans=0.125 2024-08-12 02:36:05,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.08 vs. limit=22.5 2024-08-12 02:36:18,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.531e+01 2.878e+01 3.285e+01 6.916e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 02:36:20,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1415120.0, ans=0.125 2024-08-12 02:36:22,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.17 vs. limit=10.0 2024-08-12 02:36:23,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1415120.0, ans=0.1 2024-08-12 02:36:35,673 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 02:36:44,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1415320.0, ans=0.125 2024-08-12 02:36:45,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11100, loss[loss=0.107, beats_loss=0.009748, ecapa_loss=0.0001804, whisper_loss=0.09545, over 23432.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01108, ecapa_loss=0.0001858, whisper_loss=0.09265, over 3925218.27 frames. ], batch size: 92, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:36:45,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1415320.0, ans=0.0 2024-08-12 02:36:47,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-12 02:36:48,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1415320.0, ans=0.2 2024-08-12 02:37:13,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-08-12 02:37:17,331 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 02:37:31,416 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 02:37:37,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1415620.0, ans=0.09899494936611666 2024-08-12 02:37:39,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1415620.0, ans=0.125 2024-08-12 02:37:48,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1415720.0, ans=0.125 2024-08-12 02:37:56,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11150, loss[loss=0.1205, beats_loss=0.008417, ecapa_loss=0.0002277, whisper_loss=0.1098, over 16275.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01101, ecapa_loss=0.000187, whisper_loss=0.09311, over 3907167.87 frames. ], batch size: 63, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:38:05,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1415820.0, ans=0.125 2024-08-12 02:38:10,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2024-08-12 02:38:23,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1416020.0, ans=0.5 2024-08-12 02:38:37,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1416120.0, ans=0.125 2024-08-12 02:38:38,407 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 02:38:39,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.570e+01 2.845e+01 3.196e+01 4.459e+01, threshold=5.690e+01, percent-clipped=0.0 2024-08-12 02:38:45,328 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 02:39:06,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1416320.0, ans=0.0 2024-08-12 02:39:06,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11200, loss[loss=0.1132, beats_loss=0.01091, ecapa_loss=0.0001895, whisper_loss=0.1004, over 19299.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01104, ecapa_loss=0.0001873, whisper_loss=0.09236, over 3878506.41 frames. ], batch size: 78, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:39:11,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1416320.0, ans=0.125 2024-08-12 02:39:15,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-12 02:39:21,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1416420.0, ans=0.125 2024-08-12 02:39:25,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1416420.0, ans=0.0 2024-08-12 02:39:30,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1416420.0, ans=0.0 2024-08-12 02:39:43,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2024-08-12 02:40:15,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1416820.0, ans=0.1 2024-08-12 02:40:15,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1416820.0, ans=0.125 2024-08-12 02:40:16,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11250, loss[loss=0.08672, beats_loss=0.01094, ecapa_loss=0.000166, whisper_loss=0.07412, over 19165.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.000187, whisper_loss=0.09249, over 3893195.34 frames. ], batch size: 73, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:40:17,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1416820.0, ans=0.025 2024-08-12 02:40:21,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1416820.0, ans=0.0 2024-08-12 02:40:47,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1417020.0, ans=0.125 2024-08-12 02:40:49,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1417020.0, ans=0.125 2024-08-12 02:40:49,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1417020.0, ans=0.125 2024-08-12 02:40:51,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1417020.0, ans=0.125 2024-08-12 02:40:59,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.698e+01 3.076e+01 3.539e+01 6.948e+01, threshold=6.153e+01, percent-clipped=1.0 2024-08-12 02:41:11,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1417220.0, ans=0.025 2024-08-12 02:41:22,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1417220.0, ans=0.0 2024-08-12 02:41:23,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1417220.0, ans=0.0 2024-08-12 02:41:25,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11300, loss[loss=0.0884, beats_loss=0.01221, ecapa_loss=0.0001273, whisper_loss=0.07492, over 19163.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.011, ecapa_loss=0.0001864, whisper_loss=0.09292, over 3906111.44 frames. ], batch size: 73, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:41:37,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1417320.0, ans=0.5 2024-08-12 02:41:37,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1417320.0, ans=0.125 2024-08-12 02:41:42,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1417420.0, ans=0.0 2024-08-12 02:41:43,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1417420.0, ans=0.0 2024-08-12 02:41:49,080 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 02:41:56,466 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:42:14,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1417620.0, ans=0.1 2024-08-12 02:42:17,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1417620.0, ans=0.1 2024-08-12 02:42:31,756 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 02:42:34,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1417820.0, ans=0.0 2024-08-12 02:42:35,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11350, loss[loss=0.09875, beats_loss=0.01203, ecapa_loss=0.0001903, whisper_loss=0.08482, over 21891.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.0001859, whisper_loss=0.09275, over 3924258.85 frames. ], batch size: 92, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:42:37,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1417820.0, ans=0.125 2024-08-12 02:42:48,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1417920.0, ans=0.125 2024-08-12 02:43:10,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1418020.0, ans=0.1 2024-08-12 02:43:17,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1418120.0, ans=0.0 2024-08-12 02:43:18,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.545e+01 2.820e+01 3.202e+01 5.315e+01, threshold=5.639e+01, percent-clipped=0.0 2024-08-12 02:43:18,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1418120.0, ans=0.0 2024-08-12 02:43:26,700 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 02:43:41,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1418220.0, ans=0.09899494936611666 2024-08-12 02:43:44,267 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.597e-02 2024-08-12 02:43:45,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11400, loss[loss=0.1201, beats_loss=0.0119, ecapa_loss=0.0001663, whisper_loss=0.1065, over 23650.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01102, ecapa_loss=0.0001851, whisper_loss=0.09336, over 3929243.68 frames. ], batch size: 91, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:43:49,565 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 02:44:03,726 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.011e+00 2024-08-12 02:44:04,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1418420.0, ans=0.0 2024-08-12 02:44:14,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-08-12 02:44:15,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1418520.0, ans=0.0 2024-08-12 02:44:17,876 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 02:44:20,864 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 02:44:34,872 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 02:44:41,270 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-12 02:44:44,012 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 02:44:44,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1418720.0, ans=0.1 2024-08-12 02:44:53,378 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11450, loss[loss=0.1119, beats_loss=0.01098, ecapa_loss=0.0001702, whisper_loss=0.09927, over 22189.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01105, ecapa_loss=0.000185, whisper_loss=0.09372, over 3930130.00 frames. ], batch size: 88, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:45:00,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1418820.0, ans=0.07 2024-08-12 02:45:03,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2024-08-12 02:45:07,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1418920.0, ans=0.1 2024-08-12 02:45:10,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1418920.0, ans=0.125 2024-08-12 02:45:14,397 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:45:24,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1419020.0, ans=0.125 2024-08-12 02:45:27,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1419020.0, ans=0.125 2024-08-12 02:45:36,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.629e+01 3.024e+01 3.484e+01 5.992e+01, threshold=6.048e+01, percent-clipped=1.0 2024-08-12 02:45:37,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1419120.0, ans=0.1 2024-08-12 02:45:38,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1419120.0, ans=0.0 2024-08-12 02:45:49,085 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 02:45:56,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1419220.0, ans=0.0 2024-08-12 02:45:57,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1419220.0, ans=10.0 2024-08-12 02:46:02,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11500, loss[loss=0.1118, beats_loss=0.009617, ecapa_loss=0.0001827, whisper_loss=0.1003, over 24004.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01102, ecapa_loss=0.0001853, whisper_loss=0.09362, over 3924641.77 frames. ], batch size: 95, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:46:06,911 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 02:46:17,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1419420.0, ans=0.125 2024-08-12 02:46:25,619 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 02:46:39,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-12 02:46:40,744 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 02:46:43,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1419620.0, ans=0.125 2024-08-12 02:46:51,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1419620.0, ans=0.125 2024-08-12 02:47:03,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1419720.0, ans=0.125 2024-08-12 02:47:11,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11550, loss[loss=0.07054, beats_loss=0.01203, ecapa_loss=0.0001705, whisper_loss=0.05681, over 14067.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.011, ecapa_loss=0.0001846, whisper_loss=0.09305, over 3869454.89 frames. ], batch size: 56, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:47:12,684 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 02:47:25,148 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 02:47:25,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1419920.0, ans=0.0 2024-08-12 02:47:33,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1419920.0, ans=0.125 2024-08-12 02:47:38,907 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 02:47:45,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-08-12 02:47:48,368 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 02:47:51,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1420120.0, ans=0.1 2024-08-12 02:47:53,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.668e+01 3.016e+01 3.497e+01 6.036e+01, threshold=6.031e+01, percent-clipped=0.0 2024-08-12 02:48:12,673 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 02:48:20,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11600, loss[loss=0.1078, beats_loss=0.01103, ecapa_loss=0.0002221, whisper_loss=0.09451, over 21997.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01105, ecapa_loss=0.0001838, whisper_loss=0.09242, over 3886498.14 frames. ], batch size: 92, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:48:20,838 INFO [train_multi_KD3.py:844] (3/4) A total of 98 cuts. 26 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-12 02:48:21,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1420320.0, ans=0.2 2024-08-12 02:48:23,648 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 02:48:31,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1420320.0, ans=0.125 2024-08-12 02:48:36,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-12 02:48:46,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1420420.0, ans=0.1 2024-08-12 02:48:57,004 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 02:49:02,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1420620.0, ans=0.125 2024-08-12 02:49:02,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-12 02:49:11,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1420620.0, ans=0.125 2024-08-12 02:49:15,682 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 02:49:25,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1420720.0, ans=0.125 2024-08-12 02:49:29,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11650, loss[loss=0.09982, beats_loss=0.01175, ecapa_loss=0.0001974, whisper_loss=0.08609, over 16689.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01121, ecapa_loss=0.0001837, whisper_loss=0.09192, over 3947078.74 frames. ], batch size: 66, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:49:29,848 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 02:49:32,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1420820.0, ans=0.125 2024-08-12 02:49:54,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1420920.0, ans=0.125 2024-08-12 02:49:58,343 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 02:50:01,297 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 02:50:05,267 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-12 02:50:07,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2024-08-12 02:50:12,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.632e+01 2.905e+01 3.202e+01 4.413e+01, threshold=5.810e+01, percent-clipped=0.0 2024-08-12 02:50:38,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11700, loss[loss=0.09434, beats_loss=0.01158, ecapa_loss=0.000133, whisper_loss=0.08142, over 17304.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0001827, whisper_loss=0.09203, over 3968662.16 frames. ], batch size: 66, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:50:57,539 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 02:51:11,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1421520.0, ans=0.2 2024-08-12 02:51:39,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1421720.0, ans=0.125 2024-08-12 02:51:46,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11750, loss[loss=0.09184, beats_loss=0.01142, ecapa_loss=0.0001912, whisper_loss=0.0785, over 16841.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01122, ecapa_loss=0.0001837, whisper_loss=0.09182, over 3956839.44 frames. ], batch size: 69, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:51:54,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1421820.0, ans=10.0 2024-08-12 02:52:04,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1421920.0, ans=0.0 2024-08-12 02:52:15,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1422020.0, ans=0.125 2024-08-12 02:52:29,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.531e+01 2.844e+01 3.355e+01 7.826e+01, threshold=5.688e+01, percent-clipped=1.0 2024-08-12 02:52:40,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1422220.0, ans=0.0 2024-08-12 02:52:43,339 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 02:52:51,351 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 02:52:55,166 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11800, loss[loss=0.07657, beats_loss=0.01495, ecapa_loss=0.0001939, whisper_loss=0.05968, over 15279.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01127, ecapa_loss=0.0001838, whisper_loss=0.09214, over 3981962.57 frames. ], batch size: 64, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:53:08,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1422420.0, ans=0.125 2024-08-12 02:53:09,938 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 02:53:17,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1422420.0, ans=0.0 2024-08-12 02:53:18,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1422420.0, ans=10.0 2024-08-12 02:54:04,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11850, loss[loss=0.1141, beats_loss=0.01036, ecapa_loss=0.0002182, whisper_loss=0.1016, over 19411.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01118, ecapa_loss=0.0001834, whisper_loss=0.09332, over 3989931.50 frames. ], batch size: 77, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:54:14,398 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 02:54:47,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.632e+01 2.955e+01 3.333e+01 2.077e+02, threshold=5.910e+01, percent-clipped=1.0 2024-08-12 02:54:53,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1423120.0, ans=0.125 2024-08-12 02:54:53,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1423120.0, ans=0.125 2024-08-12 02:55:01,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1423220.0, ans=0.125 2024-08-12 02:55:12,386 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11900, loss[loss=0.1134, beats_loss=0.01253, ecapa_loss=0.0001644, whisper_loss=0.09927, over 22646.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01123, ecapa_loss=0.0001824, whisper_loss=0.09348, over 3980636.21 frames. ], batch size: 89, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:55:22,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1423320.0, ans=0.125 2024-08-12 02:55:23,811 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 02:55:29,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1423420.0, ans=0.125 2024-08-12 02:55:40,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1423520.0, ans=0.2 2024-08-12 02:55:49,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-12 02:55:50,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1423520.0, ans=0.05 2024-08-12 02:56:01,670 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-12 02:56:18,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1423720.0, ans=0.0 2024-08-12 02:56:22,508 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 11950, loss[loss=0.1371, beats_loss=0.008555, ecapa_loss=0.000191, whisper_loss=0.1266, over 20383.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01114, ecapa_loss=0.0001831, whisper_loss=0.09432, over 3983609.01 frames. ], batch size: 76, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:56:28,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1423820.0, ans=0.125 2024-08-12 02:56:28,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-08-12 02:56:34,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=22.5 2024-08-12 02:56:35,267 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 02:56:43,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1423920.0, ans=0.05 2024-08-12 02:56:56,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1424020.0, ans=0.1 2024-08-12 02:56:59,584 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.189e-02 2024-08-12 02:57:01,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1424020.0, ans=0.125 2024-08-12 02:57:02,323 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-12 02:57:06,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.496e+01 2.723e+01 3.288e+01 6.365e+01, threshold=5.445e+01, percent-clipped=1.0 2024-08-12 02:57:08,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-12 02:57:10,440 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 02:57:11,798 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 02:57:27,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1424220.0, ans=0.0 2024-08-12 02:57:30,454 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 02:57:31,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12000, loss[loss=0.1137, beats_loss=0.009942, ecapa_loss=0.000192, whisper_loss=0.1019, over 23382.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01115, ecapa_loss=0.000183, whisper_loss=0.09395, over 3972620.87 frames. ], batch size: 93, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:57:31,487 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 02:58:10,741 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006161, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 02:58:28,788 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on SV_voxceleb1: loss=0.005027, beats_loss=0, ecapa_loss=0.0005027, whisper_loss=0, over 939242.00 frames. 2024-08-12 02:59:53,027 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0009, 0.0466, 0.0027, 0.0204, 0.0042, 0.0775, 0.0381, 0.0563], device='cuda:3') 2024-08-12 03:00:26,511 INFO [train_multi_KD3.py:1149] (3/4) Epoch 10, validation on AT_audioset: loss=0.02469, beats_loss=0.02469, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 03:00:26,515 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 03:00:35,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1424320.0, ans=0.125 2024-08-12 03:00:43,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1424420.0, ans=0.0 2024-08-12 03:00:47,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1424420.0, ans=0.125 2024-08-12 03:00:49,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1424420.0, ans=0.2 2024-08-12 03:00:58,243 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 03:01:02,008 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 03:01:03,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1424520.0, ans=0.2 2024-08-12 03:01:03,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1424520.0, ans=0.125 2024-08-12 03:01:36,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12050, loss[loss=0.106, beats_loss=0.01215, ecapa_loss=0.0001565, whisper_loss=0.09232, over 21289.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01112, ecapa_loss=0.0001822, whisper_loss=0.09331, over 3960295.02 frames. ], batch size: 84, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:01:44,891 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 03:01:46,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1424820.0, ans=0.125 2024-08-12 03:02:05,874 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 03:02:07,178 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 03:02:19,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.644e+01 2.915e+01 3.248e+01 4.728e+01, threshold=5.830e+01, percent-clipped=0.0 2024-08-12 03:02:21,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1425120.0, ans=0.125 2024-08-12 03:02:42,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1425220.0, ans=0.0 2024-08-12 03:02:45,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12100, loss[loss=0.1022, beats_loss=0.01242, ecapa_loss=0.0001763, whisper_loss=0.088, over 22296.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01111, ecapa_loss=0.0001827, whisper_loss=0.09302, over 3937119.50 frames. ], batch size: 93, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:03:02,774 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-12 03:03:04,276 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 03:03:05,619 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 03:03:10,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-12 03:03:26,291 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:03:29,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.04 vs. limit=5.0 2024-08-12 03:03:43,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1425720.0, ans=0.125 2024-08-12 03:03:53,869 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 03:03:54,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1425820.0, ans=0.0 2024-08-12 03:03:54,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12150, loss[loss=0.1032, beats_loss=0.01171, ecapa_loss=0.0001423, whisper_loss=0.09008, over 22659.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01111, ecapa_loss=0.0001828, whisper_loss=0.09286, over 3927769.57 frames. ], batch size: 87, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:04:13,309 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 03:04:34,122 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 03:04:38,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.674e+01 3.067e+01 3.443e+01 6.340e+01, threshold=6.135e+01, percent-clipped=1.0 2024-08-12 03:04:38,364 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 03:04:42,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1426120.0, ans=0.0 2024-08-12 03:04:51,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-12 03:04:52,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1426220.0, ans=0.125 2024-08-12 03:05:04,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12200, loss[loss=0.1182, beats_loss=0.01151, ecapa_loss=0.0001468, whisper_loss=0.1052, over 24069.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01113, ecapa_loss=0.0001808, whisper_loss=0.09291, over 3930831.44 frames. ], batch size: 93, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:05:04,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1426320.0, ans=0.0 2024-08-12 03:05:15,068 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 03:05:17,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-12 03:05:20,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1426420.0, ans=0.0 2024-08-12 03:05:27,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1426420.0, ans=0.125 2024-08-12 03:05:31,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1426520.0, ans=0.1 2024-08-12 03:06:02,299 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 03:06:05,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1426720.0, ans=0.125 2024-08-12 03:06:08,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1426720.0, ans=0.125 2024-08-12 03:06:13,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12250, loss[loss=0.09293, beats_loss=0.01264, ecapa_loss=0.0001835, whisper_loss=0.07845, over 21988.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01107, ecapa_loss=0.0001816, whisper_loss=0.09282, over 3913634.16 frames. ], batch size: 90, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:06:18,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-12 03:06:47,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-12 03:06:50,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1427020.0, ans=0.125 2024-08-12 03:06:56,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.672e+01 2.930e+01 3.249e+01 5.324e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-12 03:06:58,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1427120.0, ans=0.125 2024-08-12 03:07:04,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1427120.0, ans=0.5 2024-08-12 03:07:13,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1427220.0, ans=0.1 2024-08-12 03:07:18,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1427220.0, ans=0.125 2024-08-12 03:07:23,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12300, loss[loss=0.1038, beats_loss=0.01101, ecapa_loss=0.0001812, whisper_loss=0.09096, over 22533.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01103, ecapa_loss=0.0001827, whisper_loss=0.09324, over 3927486.72 frames. ], batch size: 91, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:07:41,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1427420.0, ans=0.0 2024-08-12 03:07:53,767 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 03:07:56,514 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 03:08:12,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1427620.0, ans=0.0 2024-08-12 03:08:31,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1427820.0, ans=0.0 2024-08-12 03:08:32,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12350, loss[loss=0.08817, beats_loss=0.01321, ecapa_loss=0.0001902, whisper_loss=0.07306, over 15286.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01103, ecapa_loss=0.0001849, whisper_loss=0.09243, over 3897996.79 frames. ], batch size: 62, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:08:32,844 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 03:08:40,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1427820.0, ans=0.125 2024-08-12 03:09:06,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1428020.0, ans=0.1 2024-08-12 03:09:18,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1428120.0, ans=0.1 2024-08-12 03:09:18,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.674e+01 3.021e+01 3.383e+01 7.125e+01, threshold=6.043e+01, percent-clipped=2.0 2024-08-12 03:09:32,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1428220.0, ans=0.125 2024-08-12 03:09:32,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1428220.0, ans=0.125 2024-08-12 03:09:48,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12400, loss[loss=0.09411, beats_loss=0.01323, ecapa_loss=0.0001539, whisper_loss=0.07934, over 14284.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01102, ecapa_loss=0.0001859, whisper_loss=0.09227, over 3888668.11 frames. ], batch size: 57, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:09:59,597 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 03:10:09,681 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:10:12,275 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 03:10:18,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1428520.0, ans=0.0 2024-08-12 03:10:22,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1428520.0, ans=0.05 2024-08-12 03:10:43,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1428620.0, ans=0.125 2024-08-12 03:10:45,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-12 03:10:49,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1428720.0, ans=0.125 2024-08-12 03:10:49,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1428720.0, ans=0.0 2024-08-12 03:10:58,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1428720.0, ans=0.0 2024-08-12 03:10:59,764 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 03:11:02,563 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12450, loss[loss=0.09278, beats_loss=0.01204, ecapa_loss=0.0001959, whisper_loss=0.07878, over 17433.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001862, whisper_loss=0.09174, over 3914602.63 frames. ], batch size: 69, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:11:03,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-12 03:11:24,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1428920.0, ans=0.0 2024-08-12 03:11:27,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1428920.0, ans=0.125 2024-08-12 03:11:34,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1429020.0, ans=0.125 2024-08-12 03:11:38,233 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 03:11:39,517 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 03:11:44,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1429120.0, ans=0.0 2024-08-12 03:11:46,534 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+01 2.502e+01 2.764e+01 3.282e+01 5.590e+01, threshold=5.528e+01, percent-clipped=0.0 2024-08-12 03:11:50,242 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-12 03:11:53,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1429120.0, ans=0.05 2024-08-12 03:11:56,553 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:12:08,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1429220.0, ans=0.2 2024-08-12 03:12:11,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1429220.0, ans=0.125 2024-08-12 03:12:14,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12500, loss[loss=0.1111, beats_loss=0.01167, ecapa_loss=0.0001755, whisper_loss=0.0977, over 23046.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001841, whisper_loss=0.09159, over 3934396.42 frames. ], batch size: 91, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:12:23,336 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 03:12:29,147 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 03:12:29,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1429420.0, ans=0.125 2024-08-12 03:12:40,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1429420.0, ans=0.1 2024-08-12 03:12:43,644 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 03:12:44,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1429520.0, ans=0.2 2024-08-12 03:12:44,996 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 03:13:04,327 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 03:13:04,922 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.112e-03 2024-08-12 03:13:15,524 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 03:13:27,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12550, loss[loss=0.09472, beats_loss=0.01259, ecapa_loss=0.0002039, whisper_loss=0.08009, over 20294.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01113, ecapa_loss=0.0001831, whisper_loss=0.09232, over 3956698.81 frames. ], batch size: 85, lr: 6.20e-03, grad_scale: 2.305843009213694e+18 2024-08-12 03:13:37,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1429820.0, ans=0.2 2024-08-12 03:13:37,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=15.0 2024-08-12 03:13:45,276 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 03:14:04,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1430020.0, ans=0.1 2024-08-12 03:14:12,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.663e+01 2.938e+01 3.317e+01 5.229e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 03:14:23,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1430220.0, ans=0.0 2024-08-12 03:14:24,731 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.472e+01 2024-08-12 03:14:29,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1430220.0, ans=0.1 2024-08-12 03:14:38,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12600, loss[loss=0.1086, beats_loss=0.0116, ecapa_loss=0.0001813, whisper_loss=0.09515, over 21164.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01111, ecapa_loss=0.0001846, whisper_loss=0.09276, over 3944894.84 frames. ], batch size: 86, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:14:44,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1430320.0, ans=0.125 2024-08-12 03:14:46,767 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 03:14:49,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1430320.0, ans=0.125 2024-08-12 03:15:09,902 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 03:15:14,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-12 03:15:45,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1430720.0, ans=0.125 2024-08-12 03:15:52,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12650, loss[loss=0.101, beats_loss=0.009659, ecapa_loss=0.0001659, whisper_loss=0.08968, over 15444.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0001844, whisper_loss=0.09201, over 3939140.41 frames. ], batch size: 59, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:15:54,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1430820.0, ans=0.0 2024-08-12 03:16:04,354 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 03:16:21,004 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-12 03:16:38,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1431120.0, ans=0.2 2024-08-12 03:16:38,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.672e+01 3.119e+01 3.630e+01 6.657e+01, threshold=6.239e+01, percent-clipped=2.0 2024-08-12 03:17:01,676 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 03:17:05,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12700, loss[loss=0.1055, beats_loss=0.01095, ecapa_loss=0.0002178, whisper_loss=0.09238, over 21717.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01117, ecapa_loss=0.0001844, whisper_loss=0.09248, over 3935697.96 frames. ], batch size: 89, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:17:07,333 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 03:17:07,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1431320.0, ans=0.0 2024-08-12 03:17:07,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1431320.0, ans=0.125 2024-08-12 03:17:10,154 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 03:17:14,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1431320.0, ans=0.05 2024-08-12 03:17:15,875 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 03:17:32,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2024-08-12 03:17:33,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1431520.0, ans=0.1 2024-08-12 03:17:36,630 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 03:17:38,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1431520.0, ans=0.125 2024-08-12 03:17:46,405 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 03:17:59,886 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 03:18:18,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12750, loss[loss=0.1229, beats_loss=0.01169, ecapa_loss=0.000137, whisper_loss=0.1099, over 14898.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0112, ecapa_loss=0.0001845, whisper_loss=0.09202, over 3884539.80 frames. ], batch size: 54, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:18:18,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1431820.0, ans=0.0 2024-08-12 03:18:34,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1431920.0, ans=0.0 2024-08-12 03:18:39,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1431920.0, ans=0.0 2024-08-12 03:18:41,778 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 03:18:59,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1432120.0, ans=0.125 2024-08-12 03:19:01,812 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 03:19:02,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-08-12 03:19:02,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.558e+01 2.840e+01 3.489e+01 4.506e+01, threshold=5.680e+01, percent-clipped=0.0 2024-08-12 03:19:16,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1432220.0, ans=0.125 2024-08-12 03:19:22,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1432220.0, ans=0.0 2024-08-12 03:19:25,905 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 03:19:29,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12800, loss[loss=0.09513, beats_loss=0.009988, ecapa_loss=0.0002245, whisper_loss=0.08289, over 16525.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01115, ecapa_loss=0.0001853, whisper_loss=0.09234, over 3895287.26 frames. ], batch size: 68, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:19:31,606 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 03:19:37,078 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-12 03:19:40,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1432320.0, ans=0.125 2024-08-12 03:20:06,388 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 03:20:21,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1432620.0, ans=0.125 2024-08-12 03:20:31,088 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 03:20:39,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12850, loss[loss=0.0785, beats_loss=0.008797, ecapa_loss=0.0002061, whisper_loss=0.06764, over 13506.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01111, ecapa_loss=0.0001866, whisper_loss=0.09227, over 3845325.10 frames. ], batch size: 54, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:21:20,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2024-08-12 03:21:23,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.483e+01 2.799e+01 3.175e+01 4.760e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-12 03:21:26,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1433120.0, ans=0.2 2024-08-12 03:21:34,986 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 03:21:40,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1433220.0, ans=0.125 2024-08-12 03:21:45,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1433220.0, ans=0.125 2024-08-12 03:21:48,354 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12900, loss[loss=0.09023, beats_loss=0.0117, ecapa_loss=0.0001827, whisper_loss=0.07671, over 15784.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01119, ecapa_loss=0.000186, whisper_loss=0.0908, over 3847612.75 frames. ], batch size: 63, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:21:58,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1433320.0, ans=0.015 2024-08-12 03:22:07,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=12.0 2024-08-12 03:22:08,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1433420.0, ans=0.2 2024-08-12 03:22:10,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.88 vs. limit=12.0 2024-08-12 03:22:21,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-12 03:22:27,437 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 03:22:54,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1433720.0, ans=0.125 2024-08-12 03:22:58,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 12950, loss[loss=0.1035, beats_loss=0.01326, ecapa_loss=0.0001402, whisper_loss=0.08886, over 19791.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01113, ecapa_loss=0.0001861, whisper_loss=0.09106, over 3841280.76 frames. ], batch size: 77, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:23:05,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1433820.0, ans=0.125 2024-08-12 03:23:07,474 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 03:23:09,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1433820.0, ans=0.125 2024-08-12 03:23:13,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1433920.0, ans=0.0 2024-08-12 03:23:15,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1433920.0, ans=0.0 2024-08-12 03:23:18,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1433920.0, ans=0.1 2024-08-12 03:23:19,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1433920.0, ans=0.125 2024-08-12 03:23:45,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.584e+01 3.018e+01 3.555e+01 5.734e+01, threshold=6.036e+01, percent-clipped=3.0 2024-08-12 03:23:46,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1434120.0, ans=0.125 2024-08-12 03:24:03,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1434220.0, ans=0.2 2024-08-12 03:24:03,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-12 03:24:06,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1434220.0, ans=0.0 2024-08-12 03:24:08,829 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 03:24:09,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1434220.0, ans=0.05 2024-08-12 03:24:11,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13000, loss[loss=0.1324, beats_loss=0.008718, ecapa_loss=0.0001676, whisper_loss=0.122, over 16665.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01114, ecapa_loss=0.0001858, whisper_loss=0.09157, over 3870421.80 frames. ], batch size: 63, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:24:18,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1434320.0, ans=0.125 2024-08-12 03:24:38,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1434420.0, ans=0.07 2024-08-12 03:25:02,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1434620.0, ans=0.125 2024-08-12 03:25:24,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13050, loss[loss=0.1386, beats_loss=0.006341, ecapa_loss=0.0002158, whisper_loss=0.1301, over 15758.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01104, ecapa_loss=0.0001854, whisper_loss=0.09236, over 3869129.97 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:25:40,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1434920.0, ans=0.125 2024-08-12 03:25:45,393 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 03:25:45,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1434920.0, ans=0.0 2024-08-12 03:25:48,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1434920.0, ans=0.125 2024-08-12 03:25:53,312 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 03:26:12,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.574e+01 2.930e+01 3.375e+01 4.949e+01, threshold=5.859e+01, percent-clipped=0.0 2024-08-12 03:26:30,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1435220.0, ans=0.125 2024-08-12 03:26:31,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1435220.0, ans=0.0 2024-08-12 03:26:34,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1435220.0, ans=0.2 2024-08-12 03:26:36,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1435220.0, ans=0.0 2024-08-12 03:26:38,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=15.0 2024-08-12 03:26:41,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13100, loss[loss=0.1242, beats_loss=0.009295, ecapa_loss=0.0001955, whisper_loss=0.113, over 23516.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001849, whisper_loss=0.09176, over 3880228.98 frames. ], batch size: 93, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:26:50,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1435320.0, ans=0.1 2024-08-12 03:27:08,464 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 03:27:10,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-12 03:27:16,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1435520.0, ans=0.125 2024-08-12 03:27:26,907 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 03:27:45,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1435720.0, ans=0.125 2024-08-12 03:27:56,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13150, loss[loss=0.1095, beats_loss=0.01162, ecapa_loss=0.0001721, whisper_loss=0.09612, over 22403.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01115, ecapa_loss=0.000184, whisper_loss=0.09144, over 3849385.82 frames. ], batch size: 88, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:28:04,127 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-12 03:28:08,190 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 03:28:37,694 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 03:28:43,062 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.467e+01 2.835e+01 3.173e+01 4.953e+01, threshold=5.670e+01, percent-clipped=0.0 2024-08-12 03:28:50,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1436120.0, ans=0.125 2024-08-12 03:28:51,569 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-12 03:28:55,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=12.0 2024-08-12 03:28:59,082 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 03:28:59,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1436220.0, ans=0.0 2024-08-12 03:28:59,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=12.0 2024-08-12 03:29:08,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13200, loss[loss=0.09953, beats_loss=0.01045, ecapa_loss=0.0001904, whisper_loss=0.08718, over 21757.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01108, ecapa_loss=0.0001826, whisper_loss=0.09197, over 3849261.33 frames. ], batch size: 89, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:29:13,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1436320.0, ans=0.015 2024-08-12 03:29:33,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1436420.0, ans=0.05 2024-08-12 03:29:49,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1436520.0, ans=0.125 2024-08-12 03:30:01,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1436620.0, ans=0.2 2024-08-12 03:30:04,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1436620.0, ans=0.125 2024-08-12 03:30:10,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1436720.0, ans=0.0 2024-08-12 03:30:11,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1436720.0, ans=0.125 2024-08-12 03:30:22,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13250, loss[loss=0.08365, beats_loss=0.01184, ecapa_loss=0.0002047, whisper_loss=0.06976, over 17610.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001829, whisper_loss=0.09245, over 3840248.45 frames. ], batch size: 71, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:30:30,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1436820.0, ans=0.05 2024-08-12 03:30:34,601 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-12 03:30:39,370 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 03:30:41,008 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 03:30:43,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.74 vs. limit=22.5 2024-08-12 03:30:43,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1436920.0, ans=0.125 2024-08-12 03:30:49,940 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 03:30:53,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1437020.0, ans=0.1 2024-08-12 03:30:54,109 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 03:30:59,987 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 03:31:00,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1437020.0, ans=0.125 2024-08-12 03:31:10,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.496e+01 2.755e+01 3.152e+01 5.278e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-12 03:31:20,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1437120.0, ans=0.0 2024-08-12 03:31:37,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13300, loss[loss=0.1197, beats_loss=0.009687, ecapa_loss=0.0001845, whisper_loss=0.1082, over 15187.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01101, ecapa_loss=0.0001822, whisper_loss=0.09246, over 3881549.68 frames. ], batch size: 58, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:31:39,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-12 03:32:00,212 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 03:32:12,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1437520.0, ans=0.0 2024-08-12 03:32:39,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1437720.0, ans=0.0 2024-08-12 03:32:50,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13350, loss[loss=0.1055, beats_loss=0.01045, ecapa_loss=0.0001945, whisper_loss=0.09309, over 20341.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01099, ecapa_loss=0.0001817, whisper_loss=0.0934, over 3861661.94 frames. ], batch size: 84, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:33:21,238 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 03:33:25,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1438020.0, ans=0.125 2024-08-12 03:33:27,162 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 03:33:28,537 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 37 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 03:33:37,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1438120.0, ans=0.0 2024-08-12 03:33:37,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-08-12 03:33:37,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.851e+01 3.185e+01 1.772e+02, threshold=5.702e+01, percent-clipped=1.0 2024-08-12 03:33:38,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-12 03:33:41,227 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 03:33:51,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1438220.0, ans=15.0 2024-08-12 03:33:59,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=22.5 2024-08-12 03:34:04,015 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13400, loss[loss=0.1156, beats_loss=0.008679, ecapa_loss=0.0001578, whisper_loss=0.1053, over 18230.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0111, ecapa_loss=0.0001827, whisper_loss=0.09253, over 3869180.16 frames. ], batch size: 67, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:34:17,685 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 03:34:29,384 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 03:34:37,026 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 03:34:47,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2024-08-12 03:34:55,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-08-12 03:35:15,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13450, loss[loss=0.1127, beats_loss=0.01099, ecapa_loss=0.0001659, whisper_loss=0.1, over 19052.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01125, ecapa_loss=0.0001817, whisper_loss=0.09135, over 3902159.11 frames. ], batch size: 75, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:35:36,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1438920.0, ans=0.1 2024-08-12 03:35:53,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1439020.0, ans=0.0 2024-08-12 03:36:02,174 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.531e+01 2.871e+01 3.206e+01 5.320e+01, threshold=5.741e+01, percent-clipped=0.0 2024-08-12 03:36:05,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2024-08-12 03:36:05,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1439120.0, ans=0.0 2024-08-12 03:36:11,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1439120.0, ans=0.5 2024-08-12 03:36:17,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1439220.0, ans=0.1 2024-08-12 03:36:26,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1439220.0, ans=0.125 2024-08-12 03:36:29,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13500, loss[loss=0.1088, beats_loss=0.0099, ecapa_loss=0.0001868, whisper_loss=0.09705, over 19829.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0113, ecapa_loss=0.0001816, whisper_loss=0.09182, over 3906156.64 frames. ], batch size: 78, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:36:30,921 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 03:36:33,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2024-08-12 03:36:59,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1439520.0, ans=0.125 2024-08-12 03:37:09,972 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-12 03:37:27,661 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 03:37:36,330 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 03:37:41,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13550, loss[loss=0.1007, beats_loss=0.01364, ecapa_loss=0.0001473, whisper_loss=0.08554, over 20887.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01128, ecapa_loss=0.0001813, whisper_loss=0.09135, over 3869431.68 frames. ], batch size: 81, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:37:41,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1439820.0, ans=0.025 2024-08-12 03:37:44,251 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 03:37:48,511 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:37:48,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-12 03:37:53,608 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 03:38:02,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1439920.0, ans=0.1 2024-08-12 03:38:10,290 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 03:38:14,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1440020.0, ans=0.125 2024-08-12 03:38:22,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1440020.0, ans=0.2 2024-08-12 03:38:28,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.568e+01 2.866e+01 3.422e+01 5.610e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 03:38:31,419 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 03:38:33,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1440120.0, ans=0.0 2024-08-12 03:38:38,447 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 03:38:47,075 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 03:38:53,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13600, loss[loss=0.1409, beats_loss=0.007062, ecapa_loss=0.000199, whisper_loss=0.1318, over 19361.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01126, ecapa_loss=0.0001818, whisper_loss=0.09161, over 3862592.87 frames. ], batch size: 73, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:39:00,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1440320.0, ans=0.125 2024-08-12 03:39:03,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1440320.0, ans=0.1 2024-08-12 03:39:06,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1440420.0, ans=0.125 2024-08-12 03:39:08,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-08-12 03:39:12,027 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 03:39:28,184 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 03:39:34,494 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 03:39:37,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1440620.0, ans=0.0 2024-08-12 03:39:39,771 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-12 03:40:02,806 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 03:40:05,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13650, loss[loss=0.1246, beats_loss=0.0095, ecapa_loss=0.0001701, whisper_loss=0.1134, over 23036.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01124, ecapa_loss=0.0001813, whisper_loss=0.09235, over 3867037.54 frames. ], batch size: 88, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:40:19,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1440920.0, ans=0.2 2024-08-12 03:40:19,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-12 03:40:23,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1440920.0, ans=0.04949747468305833 2024-08-12 03:40:30,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1440920.0, ans=0.0 2024-08-12 03:40:50,817 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.520e+01 2.826e+01 3.243e+01 5.319e+01, threshold=5.652e+01, percent-clipped=0.0 2024-08-12 03:40:51,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1441120.0, ans=0.1 2024-08-12 03:41:00,331 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 03:41:06,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1441220.0, ans=0.1 2024-08-12 03:41:10,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.03 vs. limit=10.0 2024-08-12 03:41:17,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13700, loss[loss=0.09834, beats_loss=0.01139, ecapa_loss=0.0001869, whisper_loss=0.08508, over 21062.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01132, ecapa_loss=0.0001817, whisper_loss=0.09169, over 3901635.29 frames. ], batch size: 85, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:41:20,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1441320.0, ans=0.125 2024-08-12 03:41:36,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1441420.0, ans=0.125 2024-08-12 03:41:41,400 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 03:41:45,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.77 vs. limit=10.0 2024-08-12 03:42:00,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-12 03:42:07,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1441620.0, ans=15.0 2024-08-12 03:42:14,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1441720.0, ans=0.0 2024-08-12 03:42:21,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1441720.0, ans=0.0 2024-08-12 03:42:27,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13750, loss[loss=0.08205, beats_loss=0.01556, ecapa_loss=0.0001317, whisper_loss=0.06517, over 17946.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01125, ecapa_loss=0.000182, whisper_loss=0.09211, over 3881593.21 frames. ], batch size: 74, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:42:36,833 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 03:43:00,552 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 03:43:05,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1442020.0, ans=0.1 2024-08-12 03:43:10,384 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 03:43:11,919 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.531e+01 2.738e+01 3.278e+01 4.185e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 03:43:18,474 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 03:43:19,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1442120.0, ans=0.0 2024-08-12 03:43:27,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1442220.0, ans=0.125 2024-08-12 03:43:38,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13800, loss[loss=0.1185, beats_loss=0.01094, ecapa_loss=0.0001918, whisper_loss=0.1056, over 20672.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01123, ecapa_loss=0.000182, whisper_loss=0.09238, over 3865858.02 frames. ], batch size: 85, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:43:49,830 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 03:44:17,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1442520.0, ans=0.1 2024-08-12 03:44:40,552 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 03:44:48,818 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 03:44:50,159 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 03:44:50,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2024-08-12 03:44:51,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13850, loss[loss=0.118, beats_loss=0.009323, ecapa_loss=0.0001724, whisper_loss=0.107, over 15221.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01126, ecapa_loss=0.0001823, whisper_loss=0.09166, over 3894336.74 frames. ], batch size: 57, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:44:52,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1442820.0, ans=0.125 2024-08-12 03:45:04,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1442920.0, ans=0.1 2024-08-12 03:45:11,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-08-12 03:45:22,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1443020.0, ans=0.125 2024-08-12 03:45:29,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-12 03:45:38,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.591e+01 3.040e+01 3.441e+01 5.923e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-12 03:45:54,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1443220.0, ans=0.125 2024-08-12 03:46:04,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13900, loss[loss=0.1156, beats_loss=0.01377, ecapa_loss=0.0001402, whisper_loss=0.1004, over 22841.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01131, ecapa_loss=0.0001799, whisper_loss=0.09158, over 3882338.87 frames. ], batch size: 88, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:46:10,421 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-12 03:46:18,656 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 03:46:20,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1443420.0, ans=0.125 2024-08-12 03:46:20,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1443420.0, ans=0.125 2024-08-12 03:46:29,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1443420.0, ans=0.0 2024-08-12 03:46:34,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1443520.0, ans=0.2 2024-08-12 03:46:35,880 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 03:46:38,745 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 03:46:41,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1443520.0, ans=0.0 2024-08-12 03:46:45,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1443620.0, ans=0.125 2024-08-12 03:46:56,886 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 03:47:03,663 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 03:47:07,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.43 vs. limit=10.0 2024-08-12 03:47:14,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 13950, loss[loss=0.109, beats_loss=0.01022, ecapa_loss=0.0001945, whisper_loss=0.09687, over 21155.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01131, ecapa_loss=0.0001799, whisper_loss=0.0917, over 3905751.42 frames. ], batch size: 82, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:47:21,998 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.709e+02 2024-08-12 03:47:22,953 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 03:47:36,270 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 03:47:37,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1443920.0, ans=0.125 2024-08-12 03:47:48,617 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 03:47:53,846 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 03:47:59,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.550e+01 2.827e+01 3.293e+01 5.052e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 03:48:04,634 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 03:48:10,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1444220.0, ans=0.0 2024-08-12 03:48:13,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-12 03:48:13,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-08-12 03:48:24,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14000, loss[loss=0.09249, beats_loss=0.01031, ecapa_loss=0.0002009, whisper_loss=0.08017, over 20554.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01124, ecapa_loss=0.0001804, whisper_loss=0.09188, over 3909828.22 frames. ], batch size: 81, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:48:36,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1444320.0, ans=0.0 2024-08-12 03:48:38,708 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 03:48:51,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1444520.0, ans=0.125 2024-08-12 03:48:56,341 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 03:49:08,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1444620.0, ans=0.125 2024-08-12 03:49:11,858 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 03:49:31,117 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 03:49:32,417 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 03:49:34,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14050, loss[loss=0.1172, beats_loss=0.009188, ecapa_loss=0.0002794, whisper_loss=0.1052, over 14866.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01123, ecapa_loss=0.0001796, whisper_loss=0.09264, over 3926893.07 frames. ], batch size: 66, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:49:36,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-12 03:49:49,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1444920.0, ans=0.125 2024-08-12 03:49:58,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1444920.0, ans=0.125 2024-08-12 03:50:09,853 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 03:50:19,564 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.615e+01 2.934e+01 3.537e+01 1.110e+02, threshold=5.868e+01, percent-clipped=2.0 2024-08-12 03:50:31,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1445220.0, ans=0.125 2024-08-12 03:50:32,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1445220.0, ans=0.2 2024-08-12 03:50:44,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14100, loss[loss=0.09081, beats_loss=0.01004, ecapa_loss=0.0002003, whisper_loss=0.07876, over 14928.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01118, ecapa_loss=0.0001803, whisper_loss=0.0924, over 3896158.29 frames. ], batch size: 59, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:50:50,415 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 03:50:53,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-12 03:51:03,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1445420.0, ans=0.0 2024-08-12 03:51:16,664 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 03:51:16,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1445520.0, ans=0.1 2024-08-12 03:51:28,858 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.187e-01 2024-08-12 03:51:47,984 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 03:51:53,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14150, loss[loss=0.1209, beats_loss=0.007992, ecapa_loss=0.0002115, whisper_loss=0.1108, over 16268.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01122, ecapa_loss=0.000181, whisper_loss=0.09254, over 3903986.60 frames. ], batch size: 64, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:51:56,184 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-12 03:51:56,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1445820.0, ans=0.125 2024-08-12 03:51:57,497 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 03:52:06,485 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.703e+01 2024-08-12 03:52:25,081 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 03:52:36,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.480e+01 2.708e+01 3.118e+01 5.988e+01, threshold=5.416e+01, percent-clipped=1.0 2024-08-12 03:52:44,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1446120.0, ans=0.07 2024-08-12 03:52:52,643 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 03:53:02,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14200, loss[loss=0.09509, beats_loss=0.01218, ecapa_loss=0.0001829, whisper_loss=0.08108, over 22202.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001817, whisper_loss=0.09298, over 3905445.57 frames. ], batch size: 91, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:53:12,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1446320.0, ans=0.125 2024-08-12 03:53:15,638 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 03:53:19,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1446420.0, ans=0.2 2024-08-12 03:53:27,032 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 03:53:38,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1446520.0, ans=0.09899494936611666 2024-08-12 03:53:38,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1446520.0, ans=0.2 2024-08-12 03:53:50,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-12 03:53:51,051 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-12 03:53:52,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1446620.0, ans=0.0 2024-08-12 03:54:00,556 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 03:54:05,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=22.5 2024-08-12 03:54:08,928 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 03:54:12,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14250, loss[loss=0.09443, beats_loss=0.009827, ecapa_loss=0.00018, whisper_loss=0.0828, over 18058.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01101, ecapa_loss=0.0001827, whisper_loss=0.09365, over 3924518.01 frames. ], batch size: 69, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:54:13,345 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 03:54:19,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-12 03:54:21,733 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 03:54:38,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1446920.0, ans=0.1 2024-08-12 03:54:46,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1447020.0, ans=0.125 2024-08-12 03:54:52,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1447020.0, ans=0.125 2024-08-12 03:54:56,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1447120.0, ans=0.125 2024-08-12 03:54:58,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.831e+01 3.136e+01 3.486e+01 5.154e+01, threshold=6.272e+01, percent-clipped=0.0 2024-08-12 03:55:11,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-08-12 03:55:18,759 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 03:55:23,951 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14300, loss[loss=0.1176, beats_loss=0.01013, ecapa_loss=0.0001757, whisper_loss=0.1057, over 19625.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01095, ecapa_loss=0.0001829, whisper_loss=0.09346, over 3913285.30 frames. ], batch size: 76, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:55:33,971 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 12 from Vox, 49 fro AS 2024-08-12 03:56:22,721 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 03:56:31,061 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 03:56:32,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14350, loss[loss=0.1141, beats_loss=0.01107, ecapa_loss=0.0001745, whisper_loss=0.1013, over 19372.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01098, ecapa_loss=0.0001825, whisper_loss=0.0935, over 3952550.69 frames. ], batch size: 77, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:56:44,611 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 03:56:49,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1447920.0, ans=0.125 2024-08-12 03:56:53,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1447920.0, ans=0.125 2024-08-12 03:57:13,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1448020.0, ans=0.0 2024-08-12 03:57:17,900 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.654e+01 2.989e+01 3.360e+01 6.544e+01, threshold=5.979e+01, percent-clipped=1.0 2024-08-12 03:57:19,449 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 03:57:21,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1448120.0, ans=0.125 2024-08-12 03:57:36,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1448220.0, ans=0.125 2024-08-12 03:57:43,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14400, loss[loss=0.09991, beats_loss=0.009087, ecapa_loss=0.0001807, whisper_loss=0.08901, over 16950.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0111, ecapa_loss=0.0001828, whisper_loss=0.09275, over 3982113.14 frames. ], batch size: 66, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:57:46,247 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 03:57:50,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1448320.0, ans=0.0 2024-08-12 03:57:50,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-12 03:57:52,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2024-08-12 03:57:54,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1448320.0, ans=10.0 2024-08-12 03:57:57,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1448420.0, ans=0.0 2024-08-12 03:58:23,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1448620.0, ans=0.125 2024-08-12 03:58:27,292 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 31 from Vox, 23 fro AS 2024-08-12 03:58:29,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1448620.0, ans=0.125 2024-08-12 03:58:31,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1448620.0, ans=0.125 2024-08-12 03:58:37,029 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 03:58:48,167 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 03:58:52,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 10, batch 14450, loss[loss=0.1113, beats_loss=0.01099, ecapa_loss=0.0001702, whisper_loss=0.09862, over 23130.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01116, ecapa_loss=0.000183, whisper_loss=0.09221, over 3963343.05 frames. ], batch size: 91, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:59:08,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1448920.0, ans=0.1 2024-08-12 03:59:11,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1448920.0, ans=15.0 2024-08-12 03:59:22,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1449020.0, ans=0.2 2024-08-12 03:59:23,415 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 03:59:27,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1449020.0, ans=0.125 2024-08-12 03:59:32,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1449120.0, ans=0.125 2024-08-12 03:59:34,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.542e+01 2.850e+01 3.301e+01 1.207e+02, threshold=5.700e+01, percent-clipped=1.0 2024-08-12 03:59:43,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1449120.0, ans=0.125 2024-08-12 03:59:47,057 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 04:00:35,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 0, loss[loss=0.1085, beats_loss=0.0125, ecapa_loss=0.000215, whisper_loss=0.09381, over 20117.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0125, ecapa_loss=0.000215, whisper_loss=0.09381, over 20117.00 frames. ], batch size: 84, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:00:35,389 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 04:01:15,788 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0005978, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 04:01:27,899 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.9675e-06, 1.6256e-02, 2.1151e-03, 3.3033e+00, 3.0915e-03, 3.5233e-02, 3.8801e-02, 1.7004e-02], device='cuda:3') 2024-08-12 04:01:31,110 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on SV_voxceleb1: loss=0.004953, beats_loss=0, ecapa_loss=0.0004953, whisper_loss=0, over 939242.00 frames. 2024-08-12 04:03:27,092 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on AT_audioset: loss=0.02449, beats_loss=0.02449, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 04:03:27,095 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 04:03:31,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1449260.0, ans=0.125 2024-08-12 04:03:44,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1449260.0, ans=0.0 2024-08-12 04:03:47,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1449260.0, ans=0.125 2024-08-12 04:04:15,459 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 20 from LS+wenet, 31 from Vox, 44 fro AS 2024-08-12 04:04:28,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1449460.0, ans=0.125 2024-08-12 04:04:31,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1449460.0, ans=0.0 2024-08-12 04:04:38,418 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 04:05:33,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 50, loss[loss=0.1089, beats_loss=0.009634, ecapa_loss=0.0002291, whisper_loss=0.09695, over 22384.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01057, ecapa_loss=0.0001875, whisper_loss=0.09189, over 895861.26 frames. ], batch size: 92, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:05:39,716 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 04:05:40,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1449760.0, ans=0.125 2024-08-12 04:05:58,714 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 04:06:25,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=22.5 2024-08-12 04:06:27,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1449960.0, ans=0.125 2024-08-12 04:06:42,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1449960.0, ans=0.2 2024-08-12 04:07:08,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.961e+01 3.212e+01 3.624e+01 5.944e+01, threshold=6.424e+01, percent-clipped=1.0 2024-08-12 04:07:19,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1450160.0, ans=0.1 2024-08-12 04:07:30,240 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 100, loss[loss=0.08072, beats_loss=0.01228, ecapa_loss=0.0001933, whisper_loss=0.06651, over 16123.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001857, whisper_loss=0.08972, over 1519549.06 frames. ], batch size: 65, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:07:42,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1450260.0, ans=0.125 2024-08-12 04:07:51,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=15.0 2024-08-12 04:08:27,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2024-08-12 04:09:04,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1450560.0, ans=0.125 2024-08-12 04:09:25,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1450560.0, ans=0.2 2024-08-12 04:09:28,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1450560.0, ans=0.125 2024-08-12 04:09:29,175 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 04:09:32,260 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 04:09:55,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 150, loss[loss=0.07418, beats_loss=0.01195, ecapa_loss=0.0001627, whisper_loss=0.0606, over 16514.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001849, whisper_loss=0.09031, over 2032050.53 frames. ], batch size: 66, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:10:05,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2024-08-12 04:10:41,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1450860.0, ans=0.0 2024-08-12 04:10:51,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1450960.0, ans=0.125 2024-08-12 04:10:57,875 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 04:11:10,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1450960.0, ans=0.125 2024-08-12 04:11:14,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1451060.0, ans=0.0 2024-08-12 04:11:18,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.98 vs. limit=22.5 2024-08-12 04:11:33,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1451060.0, ans=0.1 2024-08-12 04:11:38,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.724e+01 3.107e+01 3.626e+01 6.235e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-12 04:12:04,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 200, loss[loss=0.1245, beats_loss=0.0104, ecapa_loss=0.0001512, whisper_loss=0.1126, over 15526.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01054, ecapa_loss=0.0001825, whisper_loss=0.09211, over 2419587.16 frames. ], batch size: 57, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:12:29,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1451360.0, ans=0.125 2024-08-12 04:12:38,768 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 04:12:49,590 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 04:13:14,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.39 vs. limit=22.5 2024-08-12 04:13:31,907 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 04:14:04,534 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 250, loss[loss=0.1055, beats_loss=0.01224, ecapa_loss=0.0001617, whisper_loss=0.0916, over 16748.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01061, ecapa_loss=0.0001837, whisper_loss=0.09269, over 2730771.12 frames. ], batch size: 67, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:14:04,629 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 04:14:17,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1451760.0, ans=0.125 2024-08-12 04:14:20,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1451760.0, ans=0.125 2024-08-12 04:14:22,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1451760.0, ans=0.125 2024-08-12 04:14:23,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-12 04:14:26,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1451860.0, ans=0.2 2024-08-12 04:14:30,392 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 04:14:36,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1451860.0, ans=0.0 2024-08-12 04:14:39,782 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 04:14:42,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1451860.0, ans=0.0 2024-08-12 04:14:44,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1451860.0, ans=0.0 2024-08-12 04:15:05,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-12 04:15:35,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1452060.0, ans=0.125 2024-08-12 04:15:38,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1452160.0, ans=0.0 2024-08-12 04:15:40,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1452160.0, ans=0.0 2024-08-12 04:15:41,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.465e+01 2.658e+01 3.015e+01 5.855e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-12 04:15:47,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1452160.0, ans=0.1 2024-08-12 04:16:03,561 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 300, loss[loss=0.09231, beats_loss=0.01306, ecapa_loss=0.0001954, whisper_loss=0.07729, over 14915.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001842, whisper_loss=0.0918, over 2967153.70 frames. ], batch size: 62, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:16:07,161 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 04:16:09,886 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 04:16:12,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1452260.0, ans=0.2 2024-08-12 04:16:13,990 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 04:16:16,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1452360.0, ans=0.0 2024-08-12 04:16:18,887 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 04:16:21,003 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 04:16:26,344 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 04:16:26,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1452360.0, ans=0.2 2024-08-12 04:16:32,153 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 04:16:42,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-08-12 04:16:45,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1452560.0, ans=0.2 2024-08-12 04:16:46,385 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 04:16:55,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2024-08-12 04:17:04,459 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 04:17:14,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 350, loss[loss=0.097, beats_loss=0.01088, ecapa_loss=0.0001557, whisper_loss=0.08456, over 15364.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001833, whisper_loss=0.09176, over 3128949.08 frames. ], batch size: 57, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:17:23,956 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 04:17:41,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-12 04:17:43,113 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 04:17:44,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1452960.0, ans=0.0 2024-08-12 04:17:51,959 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 04:18:15,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.542e+01 2.799e+01 3.205e+01 6.505e+01, threshold=5.597e+01, percent-clipped=2.0 2024-08-12 04:18:25,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-08-12 04:18:28,532 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 400, loss[loss=0.1162, beats_loss=0.008547, ecapa_loss=0.0002137, whisper_loss=0.1055, over 16953.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001829, whisper_loss=0.09116, over 3279800.55 frames. ], batch size: 65, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:18:39,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1453260.0, ans=0.125 2024-08-12 04:18:48,924 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 04:19:05,097 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 04:19:15,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1453560.0, ans=0.07 2024-08-12 04:19:20,145 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.541e+05 2024-08-12 04:19:39,854 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.609e+00 2024-08-12 04:19:40,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 450, loss[loss=0.08962, beats_loss=0.0118, ecapa_loss=0.0001472, whisper_loss=0.07635, over 18162.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001825, whisper_loss=0.09102, over 3395284.53 frames. ], batch size: 70, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:19:56,734 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 04:20:06,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1453860.0, ans=0.0 2024-08-12 04:20:17,510 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 04:20:18,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-12 04:20:29,589 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-12 04:20:30,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1454060.0, ans=0.0 2024-08-12 04:20:35,692 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 23 from Vox, 12 fro AS 2024-08-12 04:20:41,369 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.543e+01 2.883e+01 3.316e+01 4.776e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-12 04:20:42,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-12 04:20:43,415 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 04:20:54,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 500, loss[loss=0.1125, beats_loss=0.01073, ecapa_loss=0.000211, whisper_loss=0.09962, over 21255.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001821, whisper_loss=0.09058, over 3474440.95 frames. ], batch size: 88, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:21:05,189 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 04:21:20,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1454360.0, ans=0.125 2024-08-12 04:21:52,899 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 04:21:58,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-08-12 04:22:00,833 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.125e+00 2024-08-12 04:22:03,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1454660.0, ans=0.125 2024-08-12 04:22:09,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 550, loss[loss=0.1129, beats_loss=0.01024, ecapa_loss=0.00015, whisper_loss=0.1011, over 17786.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001808, whisper_loss=0.0908, over 3558594.09 frames. ], batch size: 67, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:22:10,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1454760.0, ans=0.125 2024-08-12 04:22:16,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1454760.0, ans=0.1 2024-08-12 04:22:19,141 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 04:22:19,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1454760.0, ans=0.125 2024-08-12 04:22:22,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1454860.0, ans=0.0 2024-08-12 04:22:22,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1454860.0, ans=0.2 2024-08-12 04:22:27,966 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 04:22:29,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1454860.0, ans=0.125 2024-08-12 04:22:29,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1454860.0, ans=0.2 2024-08-12 04:22:33,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1454860.0, ans=0.125 2024-08-12 04:22:35,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1454860.0, ans=0.2 2024-08-12 04:22:38,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1454960.0, ans=0.0 2024-08-12 04:22:45,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1454960.0, ans=0.125 2024-08-12 04:22:47,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1454960.0, ans=0.125 2024-08-12 04:23:02,420 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 04:23:08,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.603e+01 2.842e+01 3.155e+01 5.740e+01, threshold=5.685e+01, percent-clipped=0.0 2024-08-12 04:23:16,737 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.006e-01 2024-08-12 04:23:16,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1455160.0, ans=0.0 2024-08-12 04:23:19,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1455160.0, ans=0.125 2024-08-12 04:23:21,968 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 600, loss[loss=0.1303, beats_loss=0.01011, ecapa_loss=0.0001469, whisper_loss=0.1187, over 23524.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.000179, whisper_loss=0.09137, over 3668145.13 frames. ], batch size: 89, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:23:26,706 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 04:23:34,112 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 04:23:34,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1455260.0, ans=0.125 2024-08-12 04:23:36,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1455360.0, ans=0.125 2024-08-12 04:23:36,958 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 04:23:37,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1455360.0, ans=0.0 2024-08-12 04:23:37,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2024-08-12 04:23:40,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1455360.0, ans=0.2 2024-08-12 04:23:44,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1455360.0, ans=0.05 2024-08-12 04:23:47,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1455360.0, ans=0.0 2024-08-12 04:23:48,666 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 04:23:49,995 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 04:23:54,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1455460.0, ans=0.125 2024-08-12 04:23:54,822 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 04:24:04,407 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 04:24:35,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 650, loss[loss=0.1087, beats_loss=0.009052, ecapa_loss=0.0001837, whisper_loss=0.09781, over 16771.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01098, ecapa_loss=0.0001777, whisper_loss=0.09099, over 3684949.76 frames. ], batch size: 64, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:24:43,154 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 04:24:52,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1455860.0, ans=0.125 2024-08-12 04:25:02,881 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 04:25:07,475 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 04:25:11,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-12 04:25:23,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-08-12 04:25:25,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1456060.0, ans=0.125 2024-08-12 04:25:27,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2024-08-12 04:25:35,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.475e+01 2.766e+01 3.282e+01 4.630e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 04:25:37,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1456160.0, ans=0.1 2024-08-12 04:25:43,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.12 vs. limit=22.5 2024-08-12 04:25:44,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-12 04:25:48,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 700, loss[loss=0.1062, beats_loss=0.01036, ecapa_loss=0.0001635, whisper_loss=0.09425, over 14843.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.000179, whisper_loss=0.09107, over 3714671.20 frames. ], batch size: 57, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:25:55,677 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 04:26:07,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1456360.0, ans=0.0 2024-08-12 04:26:15,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-12 04:26:23,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1456460.0, ans=0.125 2024-08-12 04:26:41,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1456560.0, ans=0.07 2024-08-12 04:26:56,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1456660.0, ans=0.0 2024-08-12 04:27:05,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1456660.0, ans=0.2 2024-08-12 04:27:07,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 750, loss[loss=0.1073, beats_loss=0.0121, ecapa_loss=0.0001845, whisper_loss=0.09336, over 18478.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001787, whisper_loss=0.091, over 3720045.06 frames. ], batch size: 73, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:27:34,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.19 vs. limit=15.0 2024-08-12 04:28:01,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-12 04:28:07,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1457060.0, ans=0.125 2024-08-12 04:28:14,441 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 04:28:16,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.543e+01 2.919e+01 3.268e+01 8.785e+01, threshold=5.838e+01, percent-clipped=1.0 2024-08-12 04:28:28,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1457160.0, ans=0.2 2024-08-12 04:28:32,376 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 800, loss[loss=0.1305, beats_loss=0.01033, ecapa_loss=0.0002103, whisper_loss=0.118, over 23043.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01094, ecapa_loss=0.0001797, whisper_loss=0.09056, over 3751045.34 frames. ], batch size: 90, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:29:23,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=22.5 2024-08-12 04:29:24,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1457560.0, ans=0.125 2024-08-12 04:29:32,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1457560.0, ans=0.09899494936611666 2024-08-12 04:29:33,212 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 04:29:35,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=22.5 2024-08-12 04:29:42,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-12 04:29:52,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 850, loss[loss=0.103, beats_loss=0.008955, ecapa_loss=0.0002022, whisper_loss=0.09204, over 17938.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01098, ecapa_loss=0.0001793, whisper_loss=0.09017, over 3771657.48 frames. ], batch size: 74, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:30:03,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1457760.0, ans=0.125 2024-08-12 04:30:09,501 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 04:30:19,944 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 04:30:31,559 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 04:30:33,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1457960.0, ans=0.0 2024-08-12 04:30:35,898 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 04:30:40,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1458060.0, ans=0.0 2024-08-12 04:30:47,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1458060.0, ans=0.125 2024-08-12 04:30:50,439 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 04:30:57,279 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.636e+01 2.987e+01 3.471e+01 7.869e+01, threshold=5.974e+01, percent-clipped=5.0 2024-08-12 04:31:02,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1458160.0, ans=0.125 2024-08-12 04:31:08,310 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 04:31:10,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 900, loss[loss=0.09417, beats_loss=0.01405, ecapa_loss=0.0001768, whisper_loss=0.07835, over 21438.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01099, ecapa_loss=0.0001782, whisper_loss=0.09004, over 3765273.03 frames. ], batch size: 89, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:31:26,785 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 04:31:27,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1458360.0, ans=0.1 2024-08-12 04:31:40,810 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 04:31:43,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.76 vs. limit=22.5 2024-08-12 04:31:54,563 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 04:31:56,979 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:32:05,567 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-12 04:32:13,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=12.0 2024-08-12 04:32:16,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1458660.0, ans=0.2 2024-08-12 04:32:20,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-12 04:32:26,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1458660.0, ans=0.07 2024-08-12 04:32:26,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2024-08-12 04:32:31,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1458660.0, ans=0.0 2024-08-12 04:32:34,208 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 950, loss[loss=0.1078, beats_loss=0.01089, ecapa_loss=0.0001545, whisper_loss=0.09536, over 17561.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01107, ecapa_loss=0.0001778, whisper_loss=0.08986, over 3769086.51 frames. ], batch size: 67, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:32:35,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1458760.0, ans=0.0 2024-08-12 04:32:51,416 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 04:32:56,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=12.0 2024-08-12 04:32:59,748 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 04:33:09,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1458960.0, ans=0.125 2024-08-12 04:33:13,306 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 04:33:19,030 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 04:33:32,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1459060.0, ans=0.1 2024-08-12 04:33:44,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.653e+01 2.939e+01 3.386e+01 4.997e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-12 04:33:51,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1459160.0, ans=0.2 2024-08-12 04:34:00,935 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1000, loss[loss=0.1145, beats_loss=0.007559, ecapa_loss=0.0001467, whisper_loss=0.1055, over 16684.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01103, ecapa_loss=0.0001764, whisper_loss=0.09036, over 3789344.45 frames. ], batch size: 57, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:34:21,813 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-12 04:34:27,265 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 04:34:43,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1459460.0, ans=0.125 2024-08-12 04:34:51,395 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 04:35:05,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-12 04:35:17,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1459660.0, ans=0.2 2024-08-12 04:35:19,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1459660.0, ans=0.0 2024-08-12 04:35:21,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1050, loss[loss=0.09937, beats_loss=0.01081, ecapa_loss=0.0002108, whisper_loss=0.08645, over 22974.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01094, ecapa_loss=0.0001777, whisper_loss=0.09054, over 3821132.07 frames. ], batch size: 94, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:35:25,785 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 04:35:33,569 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 04:35:36,911 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 04:36:06,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1459960.0, ans=0.0 2024-08-12 04:36:09,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1459960.0, ans=0.125 2024-08-12 04:36:18,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1460060.0, ans=0.125 2024-08-12 04:36:33,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.762e+01 2.974e+01 3.480e+01 4.829e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-12 04:36:38,601 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:36:48,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1100, loss[loss=0.1143, beats_loss=0.008518, ecapa_loss=0.0001472, whisper_loss=0.1043, over 16548.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01093, ecapa_loss=0.0001748, whisper_loss=0.09097, over 3825400.91 frames. ], batch size: 58, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:36:53,087 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 04:37:01,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.81 vs. limit=22.5 2024-08-12 04:37:16,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1460360.0, ans=0.125 2024-08-12 04:37:22,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1460460.0, ans=0.0 2024-08-12 04:37:33,458 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 04:37:42,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1460560.0, ans=0.0 2024-08-12 04:37:42,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1460560.0, ans=0.125 2024-08-12 04:38:13,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1150, loss[loss=0.1065, beats_loss=0.006256, ecapa_loss=0.0002246, whisper_loss=0.09801, over 16221.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01084, ecapa_loss=0.0001764, whisper_loss=0.09188, over 3834483.59 frames. ], batch size: 63, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:38:23,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1460760.0, ans=0.125 2024-08-12 04:38:23,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-12 04:39:18,489 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 04:39:19,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.588e+01 2.774e+01 3.143e+01 5.777e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 04:39:33,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1200, loss[loss=0.09149, beats_loss=0.01279, ecapa_loss=0.000153, whisper_loss=0.07716, over 19990.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.0001756, whisper_loss=0.09133, over 3841072.38 frames. ], batch size: 81, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:39:39,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2024-08-12 04:39:45,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1461260.0, ans=0.0 2024-08-12 04:39:53,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1461360.0, ans=0.125 2024-08-12 04:40:04,764 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 9 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 04:40:09,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.40 vs. limit=15.0 2024-08-12 04:40:18,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1461460.0, ans=0.125 2024-08-12 04:40:22,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1461560.0, ans=0.0 2024-08-12 04:40:41,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1461660.0, ans=0.125 2024-08-12 04:40:45,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1461660.0, ans=0.1 2024-08-12 04:40:57,463 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1250, loss[loss=0.105, beats_loss=0.01081, ecapa_loss=0.0001736, whisper_loss=0.09246, over 16792.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01093, ecapa_loss=0.000175, whisper_loss=0.09111, over 3834049.32 frames. ], batch size: 65, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:40:58,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1461760.0, ans=0.035 2024-08-12 04:41:01,310 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 04:41:24,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1461860.0, ans=0.1 2024-08-12 04:41:26,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1461860.0, ans=0.125 2024-08-12 04:41:46,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=12.0 2024-08-12 04:41:47,747 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-12 04:41:55,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1462060.0, ans=0.0 2024-08-12 04:42:07,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1462160.0, ans=0.125 2024-08-12 04:42:08,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.564e+01 2.833e+01 3.209e+01 5.019e+01, threshold=5.666e+01, percent-clipped=0.0 2024-08-12 04:42:24,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1300, loss[loss=0.07488, beats_loss=0.01344, ecapa_loss=0.0001349, whisper_loss=0.06009, over 16724.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001765, whisper_loss=0.09118, over 3816740.21 frames. ], batch size: 65, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:43:02,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1462460.0, ans=0.0 2024-08-12 04:43:19,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1462560.0, ans=0.2 2024-08-12 04:43:37,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2024-08-12 04:43:46,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1350, loss[loss=0.1337, beats_loss=0.006299, ecapa_loss=0.0001881, whisper_loss=0.1255, over 16217.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01097, ecapa_loss=0.0001758, whisper_loss=0.09111, over 3850632.51 frames. ], batch size: 60, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:43:48,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1462760.0, ans=0.0 2024-08-12 04:44:00,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1462760.0, ans=0.0 2024-08-12 04:44:02,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2024-08-12 04:44:03,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-08-12 04:44:19,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1462860.0, ans=0.2 2024-08-12 04:44:20,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1462860.0, ans=0.125 2024-08-12 04:44:37,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1462960.0, ans=0.0 2024-08-12 04:44:38,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1463060.0, ans=0.125 2024-08-12 04:44:58,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.596e+01 2.848e+01 3.248e+01 6.741e+01, threshold=5.696e+01, percent-clipped=1.0 2024-08-12 04:44:59,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1463160.0, ans=0.02 2024-08-12 04:45:11,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1400, loss[loss=0.06787, beats_loss=0.01594, ecapa_loss=0.0001257, whisper_loss=0.05067, over 14228.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0001736, whisper_loss=0.09118, over 3852828.55 frames. ], batch size: 57, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:45:16,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1463260.0, ans=0.125 2024-08-12 04:45:18,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1463260.0, ans=0.0 2024-08-12 04:45:28,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-08-12 04:45:46,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1463460.0, ans=0.125 2024-08-12 04:46:04,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1463560.0, ans=0.125 2024-08-12 04:46:07,052 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 04:46:10,692 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 04:46:12,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1463560.0, ans=0.0 2024-08-12 04:46:17,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1463660.0, ans=0.2 2024-08-12 04:46:54,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1463660.0, ans=0.1 2024-08-12 04:46:54,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1463660.0, ans=0.125 2024-08-12 04:46:59,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1450, loss[loss=0.08015, beats_loss=0.01112, ecapa_loss=0.000162, whisper_loss=0.0674, over 15548.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01106, ecapa_loss=0.0001724, whisper_loss=0.09066, over 3818835.30 frames. ], batch size: 61, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:47:06,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=8.0 2024-08-12 04:47:11,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-08-12 04:48:05,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.800e+01 3.262e+01 9.547e+01, threshold=5.600e+01, percent-clipped=2.0 2024-08-12 04:48:20,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-12 04:48:20,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1500, loss[loss=0.1062, beats_loss=0.01297, ecapa_loss=0.0001343, whisper_loss=0.09192, over 20611.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0111, ecapa_loss=0.0001724, whisper_loss=0.09035, over 3845685.94 frames. ], batch size: 79, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:48:26,752 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 04:48:28,298 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-12 04:48:56,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.37 vs. limit=22.5 2024-08-12 04:49:11,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1464560.0, ans=0.125 2024-08-12 04:49:25,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2024-08-12 04:49:28,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1464660.0, ans=0.09899494936611666 2024-08-12 04:49:28,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1464660.0, ans=0.125 2024-08-12 04:49:39,938 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 04:49:40,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1550, loss[loss=0.1131, beats_loss=0.01176, ecapa_loss=0.0001624, whisper_loss=0.09968, over 23297.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01109, ecapa_loss=0.0001734, whisper_loss=0.09062, over 3846935.90 frames. ], batch size: 88, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:49:46,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1464760.0, ans=0.09899494936611666 2024-08-12 04:50:16,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1464960.0, ans=0.125 2024-08-12 04:50:19,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0 2024-08-12 04:50:45,175 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.381e+01 2.640e+01 3.042e+01 4.916e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-12 04:50:59,582 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1600, loss[loss=0.09508, beats_loss=0.0119, ecapa_loss=0.0001617, whisper_loss=0.08157, over 20196.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01102, ecapa_loss=0.0001736, whisper_loss=0.09174, over 3858962.16 frames. ], batch size: 81, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:51:13,427 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 04:51:19,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1465360.0, ans=0.035 2024-08-12 04:51:35,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1465460.0, ans=0.125 2024-08-12 04:51:54,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1465560.0, ans=0.0 2024-08-12 04:51:56,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1465560.0, ans=0.1 2024-08-12 04:51:59,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1465560.0, ans=0.0 2024-08-12 04:52:08,172 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 04:52:15,532 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 04:52:16,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1650, loss[loss=0.1114, beats_loss=0.008429, ecapa_loss=0.0001832, whisper_loss=0.1011, over 19536.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001734, whisper_loss=0.09199, over 3845082.86 frames. ], batch size: 80, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:52:27,610 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 04:52:30,338 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 04:52:49,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1465960.0, ans=0.125 2024-08-12 04:52:50,310 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 04:52:52,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1465960.0, ans=0.125 2024-08-12 04:53:17,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1466160.0, ans=0.125 2024-08-12 04:53:19,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.459e+01 2.653e+01 3.242e+01 4.506e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-12 04:53:25,505 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 04:53:33,387 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1700, loss[loss=0.1216, beats_loss=0.008268, ecapa_loss=0.0002286, whisper_loss=0.111, over 22199.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001744, whisper_loss=0.0923, over 3841437.67 frames. ], batch size: 88, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:53:51,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1466360.0, ans=0.1 2024-08-12 04:53:53,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-12 04:53:53,901 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 04:53:54,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1466360.0, ans=0.0 2024-08-12 04:54:12,022 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 04:54:18,174 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 04:54:45,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1466660.0, ans=0.125 2024-08-12 04:54:50,300 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1750, loss[loss=0.1255, beats_loss=0.008921, ecapa_loss=0.0001938, whisper_loss=0.1147, over 22826.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001747, whisper_loss=0.09177, over 3823230.32 frames. ], batch size: 89, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:54:59,943 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 04:55:09,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1466860.0, ans=0.2 2024-08-12 04:55:11,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1466860.0, ans=0.125 2024-08-12 04:55:13,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1466860.0, ans=0.05 2024-08-12 04:55:18,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2024-08-12 04:55:22,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1466960.0, ans=0.2 2024-08-12 04:55:30,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1466960.0, ans=0.1 2024-08-12 04:55:39,046 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 04:55:53,975 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.424e+01 2.723e+01 3.040e+01 5.517e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-12 04:56:05,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=15.0 2024-08-12 04:56:07,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1800, loss[loss=0.1194, beats_loss=0.009275, ecapa_loss=0.0001952, whisper_loss=0.1082, over 22997.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001745, whisper_loss=0.09113, over 3792355.63 frames. ], batch size: 91, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:56:16,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1467260.0, ans=0.125 2024-08-12 04:56:28,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1467360.0, ans=0.125 2024-08-12 04:56:31,058 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 04:56:51,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1467560.0, ans=0.0 2024-08-12 04:56:51,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2024-08-12 04:57:12,031 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 04:57:17,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=15.0 2024-08-12 04:57:20,128 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 04:57:21,783 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 04:57:24,463 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1850, loss[loss=0.08932, beats_loss=0.01115, ecapa_loss=0.0001894, whisper_loss=0.07627, over 21089.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01098, ecapa_loss=0.000175, whisper_loss=0.09109, over 3791917.71 frames. ], batch size: 88, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:57:33,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-08-12 04:57:34,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1467760.0, ans=0.0 2024-08-12 04:57:34,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=15.0 2024-08-12 04:57:39,821 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 04:57:40,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1467860.0, ans=0.125 2024-08-12 04:58:05,913 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 04:58:27,083 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.532e+01 2.817e+01 3.253e+01 1.073e+02, threshold=5.635e+01, percent-clipped=1.0 2024-08-12 04:58:33,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1468160.0, ans=0.04949747468305833 2024-08-12 04:58:39,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1468160.0, ans=0.0 2024-08-12 04:58:41,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1900, loss[loss=0.09537, beats_loss=0.01418, ecapa_loss=0.0001514, whisper_loss=0.07968, over 18189.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.011, ecapa_loss=0.000176, whisper_loss=0.09058, over 3794457.93 frames. ], batch size: 71, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:59:07,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1468360.0, ans=0.0 2024-08-12 04:59:11,721 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 04:59:12,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1468460.0, ans=0.1 2024-08-12 04:59:19,867 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 04:59:59,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 1950, loss[loss=0.08298, beats_loss=0.01219, ecapa_loss=0.0001886, whisper_loss=0.06891, over 18984.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01101, ecapa_loss=0.0001767, whisper_loss=0.09012, over 3784406.67 frames. ], batch size: 80, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:00:24,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1468860.0, ans=0.0 2024-08-12 05:00:26,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1468860.0, ans=0.125 2024-08-12 05:00:35,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1468960.0, ans=0.09899494936611666 2024-08-12 05:00:44,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1469060.0, ans=0.125 2024-08-12 05:01:01,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.456e+01 2.694e+01 2.989e+01 6.245e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-12 05:01:15,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2000, loss[loss=0.1089, beats_loss=0.01032, ecapa_loss=0.000195, whisper_loss=0.09659, over 22263.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.011, ecapa_loss=0.0001778, whisper_loss=0.09066, over 3795164.86 frames. ], batch size: 90, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:01:15,908 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 05:01:22,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1469260.0, ans=0.1 2024-08-12 05:01:23,748 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 05:01:27,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1469260.0, ans=0.125 2024-08-12 05:01:42,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1469360.0, ans=0.0 2024-08-12 05:01:43,998 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 05:01:46,748 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 05:01:56,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1469460.0, ans=0.125 2024-08-12 05:02:00,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1469460.0, ans=0.0 2024-08-12 05:02:04,626 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 05:02:10,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1469560.0, ans=0.1 2024-08-12 05:02:34,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2050, loss[loss=0.1115, beats_loss=0.01204, ecapa_loss=0.0001789, whisper_loss=0.09771, over 19707.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001775, whisper_loss=0.09114, over 3843987.85 frames. ], batch size: 77, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:03:25,455 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.078e-02 2024-08-12 05:03:28,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1470060.0, ans=0.2 2024-08-12 05:03:33,964 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 05:03:37,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.542e+01 2.738e+01 3.129e+01 4.867e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-12 05:03:44,414 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 05:03:50,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2100, loss[loss=0.1112, beats_loss=0.01022, ecapa_loss=0.000163, whisper_loss=0.09938, over 17701.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01096, ecapa_loss=0.0001763, whisper_loss=0.09194, over 3815868.26 frames. ], batch size: 68, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:04:04,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1470360.0, ans=0.125 2024-08-12 05:04:15,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1470360.0, ans=0.1 2024-08-12 05:04:29,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.80 vs. limit=10.0 2024-08-12 05:04:38,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1470560.0, ans=15.0 2024-08-12 05:04:45,678 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 05:04:46,149 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.643e-01 2024-08-12 05:04:46,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1470560.0, ans=0.0 2024-08-12 05:04:48,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1470560.0, ans=0.125 2024-08-12 05:04:48,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1470560.0, ans=0.1 2024-08-12 05:04:58,793 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 05:05:07,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2150, loss[loss=0.1225, beats_loss=0.008383, ecapa_loss=0.0002103, whisper_loss=0.1121, over 18943.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0111, ecapa_loss=0.0001768, whisper_loss=0.09112, over 3820674.10 frames. ], batch size: 76, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:05:10,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-12 05:05:11,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1470760.0, ans=0.1 2024-08-12 05:05:13,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-12 05:05:28,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-08-12 05:05:31,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1470860.0, ans=0.07 2024-08-12 05:05:34,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1470860.0, ans=0.125 2024-08-12 05:05:38,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1470960.0, ans=0.125 2024-08-12 05:05:56,211 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 05:05:58,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1471060.0, ans=0.125 2024-08-12 05:06:09,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.509e+01 2.893e+01 3.375e+01 5.887e+01, threshold=5.785e+01, percent-clipped=2.0 2024-08-12 05:06:14,422 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.419e+05 2024-08-12 05:06:17,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-08-12 05:06:23,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2200, loss[loss=0.08604, beats_loss=0.01233, ecapa_loss=0.0001593, whisper_loss=0.07211, over 19621.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01105, ecapa_loss=0.000177, whisper_loss=0.09183, over 3813251.46 frames. ], batch size: 80, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:06:28,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2024-08-12 05:06:32,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2024-08-12 05:06:37,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-08-12 05:06:40,509 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 05:06:45,112 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 05:06:58,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1471460.0, ans=0.2 2024-08-12 05:07:00,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=12.0 2024-08-12 05:07:11,466 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 34 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-12 05:07:12,874 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 05:07:15,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.77 vs. limit=12.0 2024-08-12 05:07:37,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1471660.0, ans=0.2 2024-08-12 05:07:39,821 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 05:07:40,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2024-08-12 05:07:41,041 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2250, loss[loss=0.116, beats_loss=0.01092, ecapa_loss=0.0001913, whisper_loss=0.1031, over 18925.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01102, ecapa_loss=0.0001787, whisper_loss=0.09338, over 3870367.97 frames. ], batch size: 77, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:07:48,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1471760.0, ans=0.125 2024-08-12 05:08:03,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1471860.0, ans=0.1 2024-08-12 05:08:24,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-08-12 05:08:30,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1471960.0, ans=0.125 2024-08-12 05:08:46,341 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 05:08:51,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1472160.0, ans=0.1 2024-08-12 05:08:54,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.613e+01 2.941e+01 3.406e+01 8.387e+01, threshold=5.883e+01, percent-clipped=3.0 2024-08-12 05:09:11,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2300, loss[loss=0.1318, beats_loss=0.008958, ecapa_loss=0.0001863, whisper_loss=0.121, over 22813.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01104, ecapa_loss=0.0001786, whisper_loss=0.09371, over 3889670.40 frames. ], batch size: 88, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:09:15,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1472260.0, ans=0.1 2024-08-12 05:09:18,976 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 05:09:36,347 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 05:10:09,138 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 05:10:12,702 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 16 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 05:10:24,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=22.5 2024-08-12 05:10:26,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1472660.0, ans=0.125 2024-08-12 05:10:29,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-12 05:10:31,000 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 05:10:33,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1472660.0, ans=0.0 2024-08-12 05:10:42,184 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 05:10:46,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2350, loss[loss=0.1029, beats_loss=0.011, ecapa_loss=0.0001916, whisper_loss=0.09, over 21738.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01108, ecapa_loss=0.0001793, whisper_loss=0.09266, over 3849153.58 frames. ], batch size: 89, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:10:54,981 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 05:10:59,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1472760.0, ans=0.1 2024-08-12 05:11:13,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1472860.0, ans=0.125 2024-08-12 05:11:56,634 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 32 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 05:12:05,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1473060.0, ans=0.125 2024-08-12 05:12:08,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1473060.0, ans=0.125 2024-08-12 05:12:09,272 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 05:12:18,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.614e+01 3.008e+01 3.445e+01 5.971e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-12 05:12:24,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1473160.0, ans=0.1 2024-08-12 05:12:34,635 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 05:12:37,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2400, loss[loss=0.09845, beats_loss=0.01086, ecapa_loss=0.0001831, whisper_loss=0.08576, over 19687.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01098, ecapa_loss=0.0001801, whisper_loss=0.09297, over 3862643.28 frames. ], batch size: 79, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:12:43,129 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 05:13:07,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1473360.0, ans=0.09899494936611666 2024-08-12 05:13:14,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1473360.0, ans=0.125 2024-08-12 05:13:20,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1473460.0, ans=0.125 2024-08-12 05:13:23,886 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 05:13:31,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1473460.0, ans=0.125 2024-08-12 05:13:44,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1473560.0, ans=0.125 2024-08-12 05:14:20,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2450, loss[loss=0.08282, beats_loss=0.01074, ecapa_loss=0.0001759, whisper_loss=0.07033, over 18649.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01103, ecapa_loss=0.0001804, whisper_loss=0.09242, over 3892422.39 frames. ], batch size: 76, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:14:23,317 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.185e-02 2024-08-12 05:14:41,657 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 05:14:41,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1473860.0, ans=0.0 2024-08-12 05:14:58,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1473960.0, ans=0.5 2024-08-12 05:14:59,518 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 05:15:02,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1473960.0, ans=0.0 2024-08-12 05:15:21,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1474060.0, ans=0.125 2024-08-12 05:15:33,859 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 17 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 05:15:38,199 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.575e+01 2.893e+01 3.388e+01 4.265e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 05:15:43,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1474160.0, ans=0.125 2024-08-12 05:15:46,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1474160.0, ans=0.125 2024-08-12 05:15:51,437 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2500, loss[loss=0.1095, beats_loss=0.0111, ecapa_loss=0.0001852, whisper_loss=0.09654, over 22336.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01111, ecapa_loss=0.0001799, whisper_loss=0.09202, over 3913394.00 frames. ], batch size: 90, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:15:55,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.47 vs. limit=10.0 2024-08-12 05:15:59,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1474260.0, ans=0.1 2024-08-12 05:16:00,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-12 05:16:06,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1474360.0, ans=0.125 2024-08-12 05:16:22,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1474460.0, ans=0.2 2024-08-12 05:16:24,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1474460.0, ans=0.125 2024-08-12 05:16:26,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1474460.0, ans=0.1 2024-08-12 05:16:27,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1474460.0, ans=0.07 2024-08-12 05:16:30,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1474560.0, ans=0.07 2024-08-12 05:16:33,504 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 05:16:36,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1474560.0, ans=0.125 2024-08-12 05:16:42,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1474660.0, ans=0.125 2024-08-12 05:16:54,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2550, loss[loss=0.1122, beats_loss=0.009555, ecapa_loss=0.0002507, whisper_loss=0.1002, over 19153.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001792, whisper_loss=0.09252, over 3929296.29 frames. ], batch size: 80, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:16:55,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-12 05:17:04,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1474760.0, ans=0.09899494936611666 2024-08-12 05:17:12,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-12 05:17:30,203 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 05:17:40,286 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 05:17:44,264 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 05:17:47,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.613e+01 2.908e+01 3.447e+01 1.061e+02, threshold=5.817e+01, percent-clipped=1.0 2024-08-12 05:17:53,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2024-08-12 05:17:59,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2600, loss[loss=0.1043, beats_loss=0.01147, ecapa_loss=0.0001819, whisper_loss=0.09103, over 16683.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01103, ecapa_loss=0.0001778, whisper_loss=0.09283, over 3920087.47 frames. ], batch size: 65, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:18:12,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1475360.0, ans=0.1 2024-08-12 05:18:24,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.01 vs. limit=10.0 2024-08-12 05:18:24,690 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 05:18:49,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2024-08-12 05:18:52,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1475660.0, ans=0.0 2024-08-12 05:18:58,626 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 05:19:03,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2650, loss[loss=0.1042, beats_loss=0.01138, ecapa_loss=0.0001943, whisper_loss=0.09091, over 20411.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01098, ecapa_loss=0.0001792, whisper_loss=0.09315, over 3923133.32 frames. ], batch size: 81, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:19:14,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1475760.0, ans=0.1 2024-08-12 05:19:15,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1475860.0, ans=0.2 2024-08-12 05:19:16,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1475860.0, ans=0.125 2024-08-12 05:19:25,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1475860.0, ans=0.0 2024-08-12 05:19:26,215 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 05:19:32,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1475960.0, ans=0.05 2024-08-12 05:19:42,765 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 05:19:56,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.506e+01 2.786e+01 3.189e+01 5.235e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-12 05:20:08,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2700, loss[loss=0.08812, beats_loss=0.01062, ecapa_loss=0.0001836, whisper_loss=0.07566, over 16801.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001801, whisper_loss=0.09259, over 3919745.60 frames. ], batch size: 66, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:20:08,834 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 05:20:11,352 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 05:20:22,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1476360.0, ans=0.2 2024-08-12 05:20:31,706 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 05:20:43,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1476460.0, ans=0.125 2024-08-12 05:20:53,434 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 05:21:03,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1476660.0, ans=0.125 2024-08-12 05:21:06,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1476660.0, ans=0.05 2024-08-12 05:21:12,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1476760.0, ans=0.125 2024-08-12 05:21:13,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2750, loss[loss=0.1113, beats_loss=0.009496, ecapa_loss=0.0001859, whisper_loss=0.09994, over 16161.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01101, ecapa_loss=0.0001804, whisper_loss=0.09245, over 3892222.21 frames. ], batch size: 61, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:21:35,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-12 05:21:45,390 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 05:21:47,886 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 05:22:02,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1477060.0, ans=0.04949747468305833 2024-08-12 05:22:05,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.574e+01 2.886e+01 3.333e+01 4.847e+01, threshold=5.772e+01, percent-clipped=0.0 2024-08-12 05:22:16,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1477260.0, ans=0.125 2024-08-12 05:22:17,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2800, loss[loss=0.1014, beats_loss=0.009777, ecapa_loss=0.0002159, whisper_loss=0.08944, over 12755.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01099, ecapa_loss=0.0001802, whisper_loss=0.09275, over 3884055.97 frames. ], batch size: 54, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:22:49,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1477460.0, ans=0.0 2024-08-12 05:22:51,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1477460.0, ans=0.125 2024-08-12 05:22:52,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2024-08-12 05:22:54,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-12 05:22:56,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1477560.0, ans=0.1 2024-08-12 05:22:59,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1477560.0, ans=0.125 2024-08-12 05:23:04,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477560.0, ans=0.1 2024-08-12 05:23:19,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1477660.0, ans=0.2 2024-08-12 05:23:23,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1477660.0, ans=0.125 2024-08-12 05:23:25,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2850, loss[loss=0.101, beats_loss=0.01418, ecapa_loss=0.000172, whisper_loss=0.08506, over 20693.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001794, whisper_loss=0.0929, over 3866354.66 frames. ], batch size: 87, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:23:39,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477760.0, ans=0.1 2024-08-12 05:23:39,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1477760.0, ans=0.0 2024-08-12 05:23:40,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1477860.0, ans=0.0 2024-08-12 05:23:55,957 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 05:24:07,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1477960.0, ans=0.95 2024-08-12 05:24:12,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1478060.0, ans=0.125 2024-08-12 05:24:30,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.606e+01 3.053e+01 3.517e+01 5.532e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-12 05:24:35,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-08-12 05:24:37,453 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 05:24:44,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2900, loss[loss=0.1024, beats_loss=0.01119, ecapa_loss=0.0001922, whisper_loss=0.08932, over 22019.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01108, ecapa_loss=0.0001811, whisper_loss=0.09277, over 3890647.81 frames. ], batch size: 90, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:24:50,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1478260.0, ans=0.125 2024-08-12 05:24:55,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1478260.0, ans=0.0 2024-08-12 05:25:01,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1478360.0, ans=0.125 2024-08-12 05:25:06,904 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 05:25:21,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1478460.0, ans=0.0 2024-08-12 05:25:55,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 2950, loss[loss=0.1026, beats_loss=0.01163, ecapa_loss=0.0002076, whisper_loss=0.08892, over 21261.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01094, ecapa_loss=0.0001831, whisper_loss=0.09294, over 3876910.81 frames. ], batch size: 89, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:26:07,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1478860.0, ans=0.125 2024-08-12 05:26:23,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1478960.0, ans=0.125 2024-08-12 05:26:48,792 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.658e+01 2.945e+01 3.393e+01 5.337e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 05:26:58,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1479160.0, ans=0.125 2024-08-12 05:27:00,057 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3000, loss[loss=0.09073, beats_loss=0.01215, ecapa_loss=0.0001927, whisper_loss=0.07665, over 18814.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01096, ecapa_loss=0.0001843, whisper_loss=0.09208, over 3885577.86 frames. ], batch size: 78, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:27:00,057 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 05:27:41,703 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on ASR_libri: loss=0.2561, beats_loss=0, ecapa_loss=0.0006006, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 05:27:58,652 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on SV_voxceleb1: loss=0.004832, beats_loss=0, ecapa_loss=0.0004832, whisper_loss=0, over 939242.00 frames. 2024-08-12 05:30:00,083 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on AT_audioset: loss=0.02445, beats_loss=0.02445, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 05:30:00,087 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 05:30:07,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1479260.0, ans=0.125 2024-08-12 05:30:19,082 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 05:30:26,981 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 05:30:27,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1479460.0, ans=0.04949747468305833 2024-08-12 05:30:37,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1479560.0, ans=15.0 2024-08-12 05:30:44,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1479560.0, ans=0.0 2024-08-12 05:30:49,080 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 14 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-12 05:30:50,333 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 05:30:57,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1479660.0, ans=0.125 2024-08-12 05:31:04,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3050, loss[loss=0.1191, beats_loss=0.008522, ecapa_loss=0.0001915, whisper_loss=0.1086, over 15007.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.0001821, whisper_loss=0.09216, over 3862583.13 frames. ], batch size: 58, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:31:12,592 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 05:31:15,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1479760.0, ans=0.5 2024-08-12 05:31:21,515 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-12 05:31:31,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-12 05:31:33,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1479960.0, ans=0.125 2024-08-12 05:31:40,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1479960.0, ans=0.125 2024-08-12 05:31:45,025 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 05:31:46,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1480060.0, ans=0.0 2024-08-12 05:31:53,193 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 05:32:00,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.536e+01 2.925e+01 3.464e+01 9.985e+01, threshold=5.850e+01, percent-clipped=2.0 2024-08-12 05:32:02,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1480160.0, ans=0.0 2024-08-12 05:32:07,351 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 05:32:12,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3100, loss[loss=0.1075, beats_loss=0.009498, ecapa_loss=0.0002097, whisper_loss=0.09586, over 15919.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01104, ecapa_loss=0.0001815, whisper_loss=0.09206, over 3854461.87 frames. ], batch size: 61, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:32:12,412 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 05:32:20,203 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 05:32:39,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1480460.0, ans=0.1 2024-08-12 05:32:44,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1480460.0, ans=0.125 2024-08-12 05:32:46,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1480460.0, ans=0.07 2024-08-12 05:32:50,191 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 05:33:05,576 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 05:33:15,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1480660.0, ans=0.1 2024-08-12 05:33:17,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3150, loss[loss=0.08467, beats_loss=0.01351, ecapa_loss=0.000179, whisper_loss=0.06937, over 15414.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01108, ecapa_loss=0.0001806, whisper_loss=0.09249, over 3855535.43 frames. ], batch size: 65, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:33:19,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1480760.0, ans=0.025 2024-08-12 05:33:23,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1480760.0, ans=0.0 2024-08-12 05:33:28,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-08-12 05:33:32,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1480860.0, ans=0.2 2024-08-12 05:33:43,364 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 05:33:44,617 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 05:33:54,020 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 05:34:03,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1481060.0, ans=0.125 2024-08-12 05:34:04,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1481060.0, ans=0.1 2024-08-12 05:34:10,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.633e+01 2.990e+01 3.410e+01 4.926e+01, threshold=5.980e+01, percent-clipped=0.0 2024-08-12 05:34:11,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-12 05:34:20,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1481160.0, ans=0.125 2024-08-12 05:34:22,395 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3200, loss[loss=0.09521, beats_loss=0.01009, ecapa_loss=0.0002022, whisper_loss=0.0831, over 19048.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01103, ecapa_loss=0.0001804, whisper_loss=0.09343, over 3851893.44 frames. ], batch size: 76, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:34:32,782 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 05:34:38,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1481360.0, ans=0.125 2024-08-12 05:34:51,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1481460.0, ans=0.0 2024-08-12 05:34:57,700 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 05:35:00,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1481560.0, ans=0.0 2024-08-12 05:35:05,706 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 05:35:20,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1481660.0, ans=0.0 2024-08-12 05:35:27,334 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3250, loss[loss=0.1062, beats_loss=0.01304, ecapa_loss=0.000132, whisper_loss=0.09185, over 19484.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01098, ecapa_loss=0.0001825, whisper_loss=0.09411, over 3892110.99 frames. ], batch size: 73, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:35:31,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1481760.0, ans=0.125 2024-08-12 05:35:42,045 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-12 05:35:48,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1481860.0, ans=0.5 2024-08-12 05:36:00,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1481960.0, ans=0.0 2024-08-12 05:36:10,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1482060.0, ans=0.125 2024-08-12 05:36:21,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.550e+01 2.874e+01 3.283e+01 4.994e+01, threshold=5.748e+01, percent-clipped=0.0 2024-08-12 05:36:27,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2024-08-12 05:36:33,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3300, loss[loss=0.1138, beats_loss=0.01203, ecapa_loss=0.0001393, whisper_loss=0.1004, over 22855.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01102, ecapa_loss=0.0001815, whisper_loss=0.09407, over 3880414.79 frames. ], batch size: 91, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:36:35,810 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 05:36:46,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1482360.0, ans=0.125 2024-08-12 05:36:50,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1482360.0, ans=0.0 2024-08-12 05:37:10,696 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 05:37:26,259 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 05:37:35,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1482660.0, ans=0.04949747468305833 2024-08-12 05:37:37,757 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3350, loss[loss=0.1258, beats_loss=0.008198, ecapa_loss=0.0002012, whisper_loss=0.1156, over 23331.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01093, ecapa_loss=0.0001838, whisper_loss=0.09351, over 3890211.06 frames. ], batch size: 93, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:37:43,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2024-08-12 05:37:50,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1482860.0, ans=0.125 2024-08-12 05:38:03,281 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 05:38:13,742 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-12 05:38:23,749 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 05:38:30,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.536e+01 3.034e+01 3.396e+01 1.773e+02, threshold=6.068e+01, percent-clipped=2.0 2024-08-12 05:38:34,759 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:38:42,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3400, loss[loss=0.1031, beats_loss=0.01167, ecapa_loss=0.0002091, whisper_loss=0.08939, over 20419.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001836, whisper_loss=0.09282, over 3888897.07 frames. ], batch size: 84, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:38:46,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1483260.0, ans=0.125 2024-08-12 05:38:49,027 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 05:38:56,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1483360.0, ans=0.125 2024-08-12 05:38:59,551 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 05:39:03,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1483360.0, ans=0.025 2024-08-12 05:39:50,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3450, loss[loss=0.1194, beats_loss=0.01125, ecapa_loss=0.00016, whisper_loss=0.1065, over 20451.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0001837, whisper_loss=0.09237, over 3889398.07 frames. ], batch size: 80, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:40:16,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1483960.0, ans=0.125 2024-08-12 05:40:27,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1483960.0, ans=0.0 2024-08-12 05:40:42,639 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 05:40:46,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.618e+01 3.064e+01 3.498e+01 5.812e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-12 05:40:49,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1484160.0, ans=0.2 2024-08-12 05:40:52,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1484160.0, ans=0.0 2024-08-12 05:40:59,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3500, loss[loss=0.1131, beats_loss=0.009781, ecapa_loss=0.0002263, whisper_loss=0.1011, over 16369.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001845, whisper_loss=0.09198, over 3873831.30 frames. ], batch size: 68, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:41:06,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1484260.0, ans=0.05 2024-08-12 05:41:12,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1484260.0, ans=0.0 2024-08-12 05:41:22,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1484360.0, ans=0.05 2024-08-12 05:42:05,155 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 05:42:08,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1484660.0, ans=0.95 2024-08-12 05:42:10,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3550, loss[loss=0.09314, beats_loss=0.01287, ecapa_loss=0.0001611, whisper_loss=0.07865, over 21859.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.0001826, whisper_loss=0.09165, over 3884322.90 frames. ], batch size: 89, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:42:21,639 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 05:42:24,467 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 05:42:27,427 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 05:42:37,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2024-08-12 05:42:41,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1484960.0, ans=0.0 2024-08-12 05:42:51,519 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 32 from Vox, 23 fro AS 2024-08-12 05:43:08,559 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 05:43:09,654 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.668e+01 2.975e+01 3.438e+01 5.088e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-12 05:43:17,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1485160.0, ans=0.05 2024-08-12 05:43:21,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1485260.0, ans=0.1 2024-08-12 05:43:22,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3600, loss[loss=0.0915, beats_loss=0.01136, ecapa_loss=0.0001851, whisper_loss=0.07829, over 18821.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01103, ecapa_loss=0.0001818, whisper_loss=0.09236, over 3896858.51 frames. ], batch size: 78, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:43:27,193 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 05:43:31,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1485260.0, ans=0.125 2024-08-12 05:43:42,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-12 05:43:54,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1485460.0, ans=0.125 2024-08-12 05:44:01,778 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 05:44:07,147 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 30 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 05:44:18,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-12 05:44:27,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1485660.0, ans=0.125 2024-08-12 05:44:30,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1485660.0, ans=0.125 2024-08-12 05:44:33,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3650, loss[loss=0.1088, beats_loss=0.01102, ecapa_loss=0.0001714, whisper_loss=0.09605, over 20181.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01105, ecapa_loss=0.0001809, whisper_loss=0.09269, over 3869440.45 frames. ], batch size: 79, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:44:47,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-12 05:45:27,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1486060.0, ans=0.125 2024-08-12 05:45:32,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.529e+01 2.870e+01 3.231e+01 5.224e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-12 05:45:36,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486160.0, ans=0.1 2024-08-12 05:45:45,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3700, loss[loss=0.09685, beats_loss=0.01226, ecapa_loss=0.0002005, whisper_loss=0.08258, over 15264.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01103, ecapa_loss=0.0001814, whisper_loss=0.09254, over 3832161.08 frames. ], batch size: 61, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:46:05,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1486360.0, ans=0.0 2024-08-12 05:46:13,765 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 05:46:16,627 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 05:46:36,145 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 05:46:42,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1486660.0, ans=0.125 2024-08-12 05:46:57,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3750, loss[loss=0.1118, beats_loss=0.009146, ecapa_loss=0.000179, whisper_loss=0.1009, over 15347.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01098, ecapa_loss=0.0001827, whisper_loss=0.0931, over 3852564.83 frames. ], batch size: 58, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:47:07,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1486760.0, ans=0.2 2024-08-12 05:47:30,544 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-12 05:47:33,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1486960.0, ans=0.0 2024-08-12 05:47:45,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-12 05:47:49,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1487060.0, ans=0.125 2024-08-12 05:47:55,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.578e+01 2.846e+01 3.197e+01 4.164e+01, threshold=5.692e+01, percent-clipped=0.0 2024-08-12 05:47:55,645 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 05:48:05,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=1487160.0, ans=0.2 2024-08-12 05:48:09,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3800, loss[loss=0.08385, beats_loss=0.01257, ecapa_loss=0.0001598, whisper_loss=0.06968, over 16519.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01099, ecapa_loss=0.0001837, whisper_loss=0.09292, over 3893112.00 frames. ], batch size: 68, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:48:10,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1487260.0, ans=0.125 2024-08-12 05:48:20,447 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-12 05:48:23,094 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 05:48:25,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2024-08-12 05:48:29,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1487360.0, ans=0.1 2024-08-12 05:48:37,464 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 05:48:41,253 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 05:49:19,706 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 05:49:21,298 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 05:49:22,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3850, loss[loss=0.105, beats_loss=0.009765, ecapa_loss=0.0002018, whisper_loss=0.09323, over 16462.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01104, ecapa_loss=0.0001836, whisper_loss=0.09301, over 3883443.46 frames. ], batch size: 65, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:49:26,827 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 05:49:36,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1487860.0, ans=0.125 2024-08-12 05:49:39,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1487860.0, ans=0.125 2024-08-12 05:49:41,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1487860.0, ans=0.1 2024-08-12 05:49:42,064 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 05:49:47,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1487860.0, ans=0.0 2024-08-12 05:50:07,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2024-08-12 05:50:10,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-08-12 05:50:13,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1488060.0, ans=0.125 2024-08-12 05:50:20,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1488160.0, ans=0.125 2024-08-12 05:50:22,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.543e+01 2.911e+01 3.298e+01 4.140e+01, threshold=5.821e+01, percent-clipped=0.0 2024-08-12 05:50:33,183 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 05:50:33,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1488160.0, ans=0.125 2024-08-12 05:50:35,755 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3900, loss[loss=0.1172, beats_loss=0.009328, ecapa_loss=0.0001716, whisper_loss=0.1062, over 17661.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01104, ecapa_loss=0.000185, whisper_loss=0.09355, over 3921558.30 frames. ], batch size: 67, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:50:38,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2024-08-12 05:50:39,751 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-12 05:50:40,994 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 05:51:01,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1488360.0, ans=0.0 2024-08-12 05:51:18,958 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 05:51:31,107 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-12 05:51:33,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1488560.0, ans=0.0 2024-08-12 05:51:37,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1488660.0, ans=0.125 2024-08-12 05:51:37,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1488660.0, ans=0.125 2024-08-12 05:51:38,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1488660.0, ans=0.2 2024-08-12 05:51:44,600 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 05:51:51,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 3950, loss[loss=0.1217, beats_loss=0.01083, ecapa_loss=0.0002123, whisper_loss=0.1087, over 22138.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01094, ecapa_loss=0.0001855, whisper_loss=0.09446, over 3918445.50 frames. ], batch size: 92, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:51:52,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1488760.0, ans=0.95 2024-08-12 05:52:06,585 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 05:52:09,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=15.0 2024-08-12 05:52:15,076 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 05:52:27,541 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 05:52:29,098 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 05:52:34,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1488960.0, ans=0.125 2024-08-12 05:52:42,751 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 05:52:43,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1489060.0, ans=0.125 2024-08-12 05:52:51,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2024-08-12 05:52:53,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.639e+01 2.878e+01 3.466e+01 7.368e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 05:52:56,946 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 14 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 05:53:07,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4000, loss[loss=0.0983, beats_loss=0.01308, ecapa_loss=0.0001834, whisper_loss=0.08339, over 20752.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01088, ecapa_loss=0.0001849, whisper_loss=0.0945, over 3902743.29 frames. ], batch size: 85, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:53:31,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-08-12 05:53:38,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=12.0 2024-08-12 05:53:46,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1489460.0, ans=0.2 2024-08-12 05:53:53,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1489560.0, ans=0.0 2024-08-12 05:53:57,287 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 05:54:11,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1489660.0, ans=0.125 2024-08-12 05:54:23,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4050, loss[loss=0.1003, beats_loss=0.009601, ecapa_loss=0.0001854, whisper_loss=0.08889, over 22489.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01101, ecapa_loss=0.0001844, whisper_loss=0.09338, over 3920328.88 frames. ], batch size: 90, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:54:52,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1489960.0, ans=0.035 2024-08-12 05:54:52,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1489960.0, ans=0.125 2024-08-12 05:54:56,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=12.0 2024-08-12 05:55:15,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1490060.0, ans=0.125 2024-08-12 05:55:17,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1490060.0, ans=0.1 2024-08-12 05:55:25,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.625e+01 2.909e+01 3.364e+01 7.852e+01, threshold=5.817e+01, percent-clipped=2.0 2024-08-12 05:55:39,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4100, loss[loss=0.0994, beats_loss=0.01351, ecapa_loss=0.000186, whisper_loss=0.08402, over 19713.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01104, ecapa_loss=0.000185, whisper_loss=0.09283, over 3925162.91 frames. ], batch size: 83, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:55:42,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.93 vs. limit=10.0 2024-08-12 05:55:49,070 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 05:56:09,357 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 05:56:22,070 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 32 from LS+wenet, 15 from Vox, 15 fro AS 2024-08-12 05:56:25,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1490560.0, ans=0.1 2024-08-12 05:56:32,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-12 05:56:43,640 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 05:56:56,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4150, loss[loss=0.1118, beats_loss=0.01058, ecapa_loss=0.0001601, whisper_loss=0.09963, over 22568.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01101, ecapa_loss=0.0001845, whisper_loss=0.09336, over 3925053.00 frames. ], batch size: 86, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:56:59,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1490760.0, ans=0.0 2024-08-12 05:57:09,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1490760.0, ans=0.125 2024-08-12 05:57:24,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1490860.0, ans=0.0 2024-08-12 05:57:30,316 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 05:57:33,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1490960.0, ans=0.0 2024-08-12 05:57:57,145 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-12 05:58:01,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.619e+01 2.867e+01 3.217e+01 5.431e+01, threshold=5.734e+01, percent-clipped=0.0 2024-08-12 05:58:15,588 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4200, loss[loss=0.09192, beats_loss=0.01101, ecapa_loss=0.0001516, whisper_loss=0.07939, over 18718.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.011, ecapa_loss=0.0001829, whisper_loss=0.09343, over 3929428.26 frames. ], batch size: 70, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:58:25,648 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 05:58:25,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1491260.0, ans=0.125 2024-08-12 05:58:27,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1491260.0, ans=0.2 2024-08-12 05:58:45,160 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 05:59:03,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1491560.0, ans=0.0 2024-08-12 05:59:07,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1491560.0, ans=0.125 2024-08-12 05:59:34,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4250, loss[loss=0.1018, beats_loss=0.01204, ecapa_loss=0.0001858, whisper_loss=0.08789, over 21635.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.000183, whisper_loss=0.09292, over 3911544.64 frames. ], batch size: 91, lr: 5.80e-03, grad_scale: 1.152921504606847e+18 2024-08-12 05:59:35,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1491760.0, ans=0.0 2024-08-12 05:59:40,437 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 05:59:44,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1491760.0, ans=0.125 2024-08-12 05:59:47,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1491760.0, ans=0.1 2024-08-12 05:59:55,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1491860.0, ans=0.125 2024-08-12 06:00:00,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=22.5 2024-08-12 06:00:06,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1491960.0, ans=0.0 2024-08-12 06:00:08,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-12 06:00:29,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1492060.0, ans=0.0 2024-08-12 06:00:40,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.448e+01 2.725e+01 3.062e+01 4.978e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-12 06:00:49,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1492160.0, ans=0.05 2024-08-12 06:00:53,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1492160.0, ans=0.125 2024-08-12 06:00:56,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4300, loss[loss=0.1266, beats_loss=0.01019, ecapa_loss=0.0001526, whisper_loss=0.1148, over 20864.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.011, ecapa_loss=0.0001818, whisper_loss=0.09273, over 3889460.49 frames. ], batch size: 79, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:01:00,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1492260.0, ans=0.0 2024-08-12 06:01:05,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-12 06:01:12,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2024-08-12 06:01:19,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=15.0 2024-08-12 06:01:30,207 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 06:02:16,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4350, loss[loss=0.08434, beats_loss=0.01329, ecapa_loss=0.0001315, whisper_loss=0.06973, over 18095.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0109, ecapa_loss=0.0001834, whisper_loss=0.09306, over 3884117.17 frames. ], batch size: 70, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:02:28,845 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:02:37,285 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.215e+05 2024-08-12 06:02:49,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1492960.0, ans=0.2 2024-08-12 06:02:49,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1492960.0, ans=0.125 2024-08-12 06:02:51,833 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 06:02:55,172 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-12 06:03:00,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2024-08-12 06:03:19,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1493060.0, ans=0.0 2024-08-12 06:03:25,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.573e+01 2.985e+01 3.568e+01 9.873e+01, threshold=5.969e+01, percent-clipped=3.0 2024-08-12 06:03:25,501 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 06:03:28,759 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 06:03:29,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1493160.0, ans=0.125 2024-08-12 06:03:40,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4400, loss[loss=0.08822, beats_loss=0.01199, ecapa_loss=0.0001971, whisper_loss=0.07426, over 21565.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01088, ecapa_loss=0.0001837, whisper_loss=0.09346, over 3870191.91 frames. ], batch size: 90, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:03:55,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1493260.0, ans=0.125 2024-08-12 06:04:44,012 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 06:05:02,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1493660.0, ans=0.125 2024-08-12 06:05:05,058 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4450, loss[loss=0.1044, beats_loss=0.01328, ecapa_loss=0.0002182, whisper_loss=0.0889, over 22655.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01097, ecapa_loss=0.0001834, whisper_loss=0.0931, over 3873639.47 frames. ], batch size: 96, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:05:36,475 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 06:05:59,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1494060.0, ans=0.125 2024-08-12 06:06:08,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1494060.0, ans=0.95 2024-08-12 06:06:13,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.568e+01 2.752e+01 3.153e+01 4.560e+01, threshold=5.503e+01, percent-clipped=0.0 2024-08-12 06:06:27,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.74 vs. limit=10.0 2024-08-12 06:06:29,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4500, loss[loss=0.0951, beats_loss=0.01227, ecapa_loss=0.0001559, whisper_loss=0.08127, over 18194.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01101, ecapa_loss=0.0001809, whisper_loss=0.09292, over 3906434.23 frames. ], batch size: 73, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:06:43,169 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 06:07:10,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1494460.0, ans=0.125 2024-08-12 06:07:13,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1494460.0, ans=0.125 2024-08-12 06:07:23,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1494560.0, ans=0.125 2024-08-12 06:07:32,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1494560.0, ans=0.035 2024-08-12 06:07:36,711 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 06:07:50,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-12 06:07:55,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4550, loss[loss=0.1237, beats_loss=0.009334, ecapa_loss=0.0002039, whisper_loss=0.1124, over 23653.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.011, ecapa_loss=0.0001803, whisper_loss=0.09369, over 3921278.73 frames. ], batch size: 95, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:08:10,663 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.280e-03 2024-08-12 06:08:20,927 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 06:08:42,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1494960.0, ans=0.125 2024-08-12 06:08:46,730 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 06:08:51,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1495060.0, ans=0.125 2024-08-12 06:08:55,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1495060.0, ans=0.125 2024-08-12 06:09:00,414 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 06:09:03,571 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-12 06:09:05,204 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:09:05,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.505e+01 2.717e+01 3.004e+01 5.094e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-12 06:09:20,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4600, loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.0001774, whisper_loss=0.08999, over 17413.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01108, ecapa_loss=0.0001803, whisper_loss=0.09251, over 3912422.56 frames. ], batch size: 73, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:09:37,973 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 06:09:38,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1495360.0, ans=0.125 2024-08-12 06:09:44,115 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.765e+00 2024-08-12 06:09:49,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-12 06:09:53,738 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 16 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 06:09:59,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1495460.0, ans=0.05 2024-08-12 06:10:02,716 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 12 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 06:10:06,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1495460.0, ans=0.1 2024-08-12 06:10:27,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=12.0 2024-08-12 06:10:40,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1495660.0, ans=0.0 2024-08-12 06:10:43,597 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 06:10:44,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4650, loss[loss=0.0975, beats_loss=0.01189, ecapa_loss=0.0001692, whisper_loss=0.08392, over 21058.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001818, whisper_loss=0.09257, over 3923858.28 frames. ], batch size: 84, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:10:59,433 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 06:11:01,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1495860.0, ans=0.0 2024-08-12 06:11:01,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1495860.0, ans=0.125 2024-08-12 06:11:12,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1495860.0, ans=0.125 2024-08-12 06:11:18,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1495960.0, ans=0.0 2024-08-12 06:11:19,692 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 40 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 06:11:48,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1496060.0, ans=0.1 2024-08-12 06:11:54,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.541e+01 2.726e+01 3.242e+01 5.233e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 06:11:55,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1496160.0, ans=0.1 2024-08-12 06:12:00,792 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 06:12:09,147 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4700, loss[loss=0.1007, beats_loss=0.01113, ecapa_loss=0.0001963, whisper_loss=0.08756, over 17659.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01106, ecapa_loss=0.0001822, whisper_loss=0.09289, over 3907512.43 frames. ], batch size: 72, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:12:13,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1496260.0, ans=0.04949747468305833 2024-08-12 06:12:19,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1496260.0, ans=0.0 2024-08-12 06:12:19,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1496260.0, ans=0.2 2024-08-12 06:12:37,856 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 06:12:38,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-12 06:12:44,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1496460.0, ans=0.125 2024-08-12 06:12:45,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1496460.0, ans=0.125 2024-08-12 06:13:00,171 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 06:13:18,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1496660.0, ans=0.0 2024-08-12 06:13:18,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.60 vs. limit=15.0 2024-08-12 06:13:28,019 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 06:13:30,829 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4750, loss[loss=0.1143, beats_loss=0.009315, ecapa_loss=0.0001547, whisper_loss=0.1035, over 15618.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01104, ecapa_loss=0.0001818, whisper_loss=0.0929, over 3899227.12 frames. ], batch size: 58, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:13:36,269 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 06:13:44,136 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 06:13:51,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-12 06:13:51,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=22.5 2024-08-12 06:13:56,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.12 vs. limit=22.5 2024-08-12 06:14:06,617 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 06:14:09,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-12 06:14:12,911 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 06:14:22,289 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 06:14:29,186 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 06:14:35,030 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-12 06:14:36,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.611e+01 2.918e+01 3.267e+01 6.538e+01, threshold=5.836e+01, percent-clipped=2.0 2024-08-12 06:14:51,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4800, loss[loss=0.1195, beats_loss=0.0115, ecapa_loss=0.0001532, whisper_loss=0.1065, over 18130.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01097, ecapa_loss=0.000183, whisper_loss=0.09273, over 3881423.76 frames. ], batch size: 66, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:14:58,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-12 06:15:21,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1497360.0, ans=0.2 2024-08-12 06:15:38,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1497460.0, ans=0.0 2024-08-12 06:15:43,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-12 06:15:58,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1497660.0, ans=0.2 2024-08-12 06:16:13,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4850, loss[loss=0.1122, beats_loss=0.01088, ecapa_loss=0.0001438, whisper_loss=0.09987, over 23681.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01096, ecapa_loss=0.0001826, whisper_loss=0.09323, over 3878730.72 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:16:25,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1497760.0, ans=0.1 2024-08-12 06:17:18,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1498160.0, ans=0.125 2024-08-12 06:17:19,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.592e+01 2.912e+01 3.180e+01 4.291e+01, threshold=5.823e+01, percent-clipped=0.0 2024-08-12 06:17:31,317 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 06:17:34,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4900, loss[loss=0.09686, beats_loss=0.01114, ecapa_loss=0.0001601, whisper_loss=0.08412, over 22538.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01101, ecapa_loss=0.0001822, whisper_loss=0.09279, over 3854316.90 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:17:39,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-12 06:17:41,806 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 06:17:43,005 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 06:18:04,133 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 06:18:13,007 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 06:18:33,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1498560.0, ans=0.125 2024-08-12 06:18:36,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1498560.0, ans=0.04949747468305833 2024-08-12 06:18:48,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1498660.0, ans=0.125 2024-08-12 06:18:49,657 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 06:18:51,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1498660.0, ans=0.125 2024-08-12 06:18:57,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 4950, loss[loss=0.1112, beats_loss=0.009877, ecapa_loss=0.0001944, whisper_loss=0.09936, over 18099.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01102, ecapa_loss=0.0001808, whisper_loss=0.0931, over 3841894.35 frames. ], batch size: 72, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:19:05,103 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:19:27,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1498860.0, ans=10.0 2024-08-12 06:19:50,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1499060.0, ans=0.0 2024-08-12 06:20:01,652 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 06:20:04,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.693e+01 3.094e+01 3.524e+01 6.311e+01, threshold=6.188e+01, percent-clipped=2.0 2024-08-12 06:20:08,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1499160.0, ans=0.125 2024-08-12 06:20:10,073 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 06:20:19,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5000, loss[loss=0.1158, beats_loss=0.01016, ecapa_loss=0.0001894, whisper_loss=0.1038, over 20750.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001797, whisper_loss=0.09303, over 3863520.81 frames. ], batch size: 81, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:20:46,205 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 06:20:47,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1499360.0, ans=0.125 2024-08-12 06:20:53,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1499460.0, ans=0.0 2024-08-12 06:21:29,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=15.0 2024-08-12 06:21:31,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1499660.0, ans=0.1 2024-08-12 06:21:32,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-08-12 06:21:41,750 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5050, loss[loss=0.09015, beats_loss=0.011, ecapa_loss=0.000236, whisper_loss=0.07679, over 14538.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01118, ecapa_loss=0.0001794, whisper_loss=0.09214, over 3878366.39 frames. ], batch size: 64, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:21:49,573 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 06:22:08,002 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 06:22:10,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1499860.0, ans=0.125 2024-08-12 06:22:22,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1499960.0, ans=0.125 2024-08-12 06:22:27,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.80 vs. limit=22.5 2024-08-12 06:22:35,808 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 06:22:37,071 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 06:22:51,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.582e+01 2.858e+01 3.371e+01 2.461e+02, threshold=5.717e+01, percent-clipped=1.0 2024-08-12 06:23:00,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1500160.0, ans=0.125 2024-08-12 06:23:05,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5100, loss[loss=0.09989, beats_loss=0.01189, ecapa_loss=0.0001901, whisper_loss=0.08611, over 22424.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01116, ecapa_loss=0.0001796, whisper_loss=0.09285, over 3905100.37 frames. ], batch size: 90, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:23:06,284 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 06:23:09,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1500260.0, ans=0.125 2024-08-12 06:23:18,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1500260.0, ans=0.0 2024-08-12 06:23:22,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1500360.0, ans=0.125 2024-08-12 06:23:56,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1500560.0, ans=0.125 2024-08-12 06:24:01,108 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 06:24:02,861 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 06:24:15,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1500660.0, ans=0.125 2024-08-12 06:24:27,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5150, loss[loss=0.1024, beats_loss=0.009874, ecapa_loss=0.0001863, whisper_loss=0.09065, over 17434.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01121, ecapa_loss=0.000178, whisper_loss=0.09202, over 3878269.61 frames. ], batch size: 70, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:24:50,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1500860.0, ans=0.0 2024-08-12 06:24:58,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1500860.0, ans=0.1 2024-08-12 06:25:13,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1500960.0, ans=0.125 2024-08-12 06:26:03,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.805e+01 3.216e+01 1.904e+02, threshold=5.610e+01, percent-clipped=1.0 2024-08-12 06:26:04,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1501160.0, ans=0.125 2024-08-12 06:26:05,152 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-12 06:26:07,416 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 06:26:17,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1501160.0, ans=0.07 2024-08-12 06:26:19,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1501160.0, ans=0.125 2024-08-12 06:26:21,169 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5200, loss[loss=0.09529, beats_loss=0.008418, ecapa_loss=0.0001868, whisper_loss=0.085, over 15278.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001796, whisper_loss=0.09226, over 3869067.83 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:26:29,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1501260.0, ans=0.07 2024-08-12 06:26:30,770 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 06:26:34,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=12.0 2024-08-12 06:26:44,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1501360.0, ans=0.125 2024-08-12 06:27:18,120 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 06:27:38,315 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 06:27:46,639 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 06:27:49,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5250, loss[loss=0.1006, beats_loss=0.01192, ecapa_loss=0.0001376, whisper_loss=0.08732, over 22814.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001807, whisper_loss=0.09169, over 3872058.36 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:27:55,774 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-12 06:28:05,879 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 06:28:15,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1501860.0, ans=0.125 2024-08-12 06:28:19,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1501860.0, ans=0.0 2024-08-12 06:28:19,968 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 06:28:24,681 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 34 from Vox, 24 fro AS 2024-08-12 06:28:25,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1501960.0, ans=0.0 2024-08-12 06:28:27,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1501960.0, ans=0.125 2024-08-12 06:28:49,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1502060.0, ans=0.125 2024-08-12 06:28:58,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.556e+01 2.811e+01 3.138e+01 9.826e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 06:29:03,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1502160.0, ans=0.0 2024-08-12 06:29:09,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1502160.0, ans=0.0 2024-08-12 06:29:09,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1502160.0, ans=0.125 2024-08-12 06:29:13,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5300, loss[loss=0.1299, beats_loss=0.01043, ecapa_loss=0.0001551, whisper_loss=0.1179, over 23163.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01105, ecapa_loss=0.0001812, whisper_loss=0.09194, over 3871982.71 frames. ], batch size: 88, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:29:19,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1502260.0, ans=0.2 2024-08-12 06:29:47,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1502460.0, ans=0.125 2024-08-12 06:29:56,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1502460.0, ans=0.125 2024-08-12 06:29:56,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1502460.0, ans=10.0 2024-08-12 06:30:25,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1502660.0, ans=0.125 2024-08-12 06:30:32,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1502660.0, ans=0.0 2024-08-12 06:30:36,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5350, loss[loss=0.1183, beats_loss=0.01032, ecapa_loss=0.0001812, whisper_loss=0.1062, over 21937.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01102, ecapa_loss=0.0001808, whisper_loss=0.09198, over 3874246.99 frames. ], batch size: 83, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:30:43,259 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 06:30:47,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1502760.0, ans=0.05 2024-08-12 06:30:50,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1502760.0, ans=0.125 2024-08-12 06:30:53,369 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.163e+02 2024-08-12 06:30:57,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2024-08-12 06:31:01,258 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-12 06:31:04,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1502860.0, ans=0.125 2024-08-12 06:31:11,203 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 06:31:11,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1502960.0, ans=0.125 2024-08-12 06:31:11,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1502960.0, ans=0.125 2024-08-12 06:31:13,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1502960.0, ans=0.125 2024-08-12 06:31:22,343 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 06:31:37,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1503060.0, ans=0.2 2024-08-12 06:31:43,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.466e+01 2.824e+01 3.264e+01 5.204e+01, threshold=5.648e+01, percent-clipped=0.0 2024-08-12 06:31:48,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1503160.0, ans=0.0 2024-08-12 06:31:50,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1503160.0, ans=0.125 2024-08-12 06:31:57,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5400, loss[loss=0.1098, beats_loss=0.008133, ecapa_loss=0.0002179, whisper_loss=0.09952, over 15742.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01102, ecapa_loss=0.0001816, whisper_loss=0.09136, over 3835144.68 frames. ], batch size: 63, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:32:38,117 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 06:32:38,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1503460.0, ans=0.125 2024-08-12 06:32:41,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1503460.0, ans=0.125 2024-08-12 06:32:44,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1503560.0, ans=0.125 2024-08-12 06:32:49,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1503560.0, ans=0.125 2024-08-12 06:32:57,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1503560.0, ans=0.2 2024-08-12 06:33:09,468 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 06:33:10,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1503660.0, ans=0.125 2024-08-12 06:33:16,536 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 06:33:17,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5450, loss[loss=0.1023, beats_loss=0.01338, ecapa_loss=0.0001323, whisper_loss=0.08758, over 23277.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001804, whisper_loss=0.09146, over 3837678.61 frames. ], batch size: 89, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:33:33,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1503860.0, ans=0.2 2024-08-12 06:33:54,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503960.0, ans=0.1 2024-08-12 06:33:59,390 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 06:34:20,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1504160.0, ans=0.125 2024-08-12 06:34:23,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.572e+01 2.892e+01 3.418e+01 4.149e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 06:34:33,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1504160.0, ans=0.125 2024-08-12 06:34:34,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1504160.0, ans=0.125 2024-08-12 06:34:37,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5500, loss[loss=0.1094, beats_loss=0.01025, ecapa_loss=0.0001597, whisper_loss=0.09757, over 22256.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01104, ecapa_loss=0.0001802, whisper_loss=0.09166, over 3870684.05 frames. ], batch size: 86, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:34:48,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1504260.0, ans=0.0 2024-08-12 06:34:53,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2024-08-12 06:35:00,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1504360.0, ans=0.125 2024-08-12 06:35:06,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-12 06:35:19,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1504460.0, ans=0.125 2024-08-12 06:35:36,503 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 06:35:53,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2024-08-12 06:35:56,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5550, loss[loss=0.08991, beats_loss=0.009375, ecapa_loss=0.0001779, whisper_loss=0.07876, over 22575.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01113, ecapa_loss=0.0001792, whisper_loss=0.09141, over 3892346.03 frames. ], batch size: 92, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:35:58,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1504760.0, ans=0.0 2024-08-12 06:36:11,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1504860.0, ans=0.1 2024-08-12 06:36:26,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1504860.0, ans=0.09899494936611666 2024-08-12 06:36:32,783 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 06:36:34,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1504960.0, ans=0.125 2024-08-12 06:36:51,190 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 06:36:53,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1505060.0, ans=0.07 2024-08-12 06:36:55,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1505060.0, ans=0.125 2024-08-12 06:36:57,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1505060.0, ans=0.2 2024-08-12 06:36:58,862 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 06:37:01,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.511e+01 2.832e+01 3.131e+01 5.675e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 06:37:07,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1505160.0, ans=0.1 2024-08-12 06:37:15,137 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 06:37:16,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5600, loss[loss=0.07824, beats_loss=0.01211, ecapa_loss=0.0001706, whisper_loss=0.06442, over 16197.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001803, whisper_loss=0.09167, over 3859026.54 frames. ], batch size: 66, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:37:17,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1505260.0, ans=0.1 2024-08-12 06:37:21,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1505260.0, ans=0.0 2024-08-12 06:37:37,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1505360.0, ans=0.125 2024-08-12 06:37:49,813 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 06:37:50,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1505460.0, ans=0.125 2024-08-12 06:37:59,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=12.0 2024-08-12 06:38:10,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1505560.0, ans=0.125 2024-08-12 06:38:27,303 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 06:38:28,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1505660.0, ans=0.0 2024-08-12 06:38:32,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1505660.0, ans=0.125 2024-08-12 06:38:36,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-12 06:38:37,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1505660.0, ans=0.1 2024-08-12 06:38:39,750 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5650, loss[loss=0.1052, beats_loss=0.009653, ecapa_loss=0.0002269, whisper_loss=0.09328, over 16832.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001793, whisper_loss=0.09188, over 3901041.18 frames. ], batch size: 71, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:38:43,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2024-08-12 06:38:45,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=22.5 2024-08-12 06:39:17,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1505960.0, ans=0.0 2024-08-12 06:39:19,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1505960.0, ans=0.125 2024-08-12 06:39:24,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2024-08-12 06:39:30,884 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 06:39:37,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1506060.0, ans=0.0 2024-08-12 06:39:43,222 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 06:39:44,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.537e+01 2.761e+01 3.260e+01 5.240e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-12 06:39:45,421 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.264e+01 2024-08-12 06:39:46,390 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 06:39:57,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1506160.0, ans=0.125 2024-08-12 06:39:57,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1506160.0, ans=0.125 2024-08-12 06:39:59,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5700, loss[loss=0.1214, beats_loss=0.0107, ecapa_loss=0.0002043, whisper_loss=0.1087, over 21020.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01109, ecapa_loss=0.0001798, whisper_loss=0.09233, over 3914422.32 frames. ], batch size: 84, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:40:05,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1506260.0, ans=0.0 2024-08-12 06:40:14,041 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 06:40:15,740 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 06:40:16,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-08-12 06:40:24,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=15.0 2024-08-12 06:40:25,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1506360.0, ans=0.125 2024-08-12 06:40:34,279 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 06:40:39,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1506460.0, ans=0.2 2024-08-12 06:40:55,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1506560.0, ans=0.0 2024-08-12 06:40:59,285 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 06:41:09,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1506660.0, ans=0.09899494936611666 2024-08-12 06:41:14,370 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 06:41:21,940 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5750, loss[loss=0.1068, beats_loss=0.009379, ecapa_loss=0.0001971, whisper_loss=0.09542, over 19457.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01105, ecapa_loss=0.0001807, whisper_loss=0.09281, over 3924389.57 frames. ], batch size: 77, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:41:24,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1506760.0, ans=0.125 2024-08-12 06:41:38,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1506860.0, ans=0.95 2024-08-12 06:41:41,596 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 06:41:43,352 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.375e-01 2024-08-12 06:41:48,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1506860.0, ans=0.0 2024-08-12 06:42:03,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1506960.0, ans=0.04949747468305833 2024-08-12 06:42:06,465 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 06:42:11,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507060.0, ans=0.1 2024-08-12 06:42:18,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1507060.0, ans=0.125 2024-08-12 06:42:23,814 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 06:42:26,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1507160.0, ans=0.125 2024-08-12 06:42:27,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.547e+01 2.860e+01 3.182e+01 5.592e+01, threshold=5.721e+01, percent-clipped=1.0 2024-08-12 06:42:41,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5800, loss[loss=0.09888, beats_loss=0.009706, ecapa_loss=0.0002287, whisper_loss=0.08689, over 14850.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0111, ecapa_loss=0.0001805, whisper_loss=0.09217, over 3880932.56 frames. ], batch size: 60, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:42:51,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1507260.0, ans=0.0 2024-08-12 06:42:53,922 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 06:43:07,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1507360.0, ans=0.125 2024-08-12 06:43:07,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=12.0 2024-08-12 06:43:11,026 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 06:43:14,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1507460.0, ans=0.125 2024-08-12 06:43:14,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1507460.0, ans=10.0 2024-08-12 06:43:34,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1507560.0, ans=0.0 2024-08-12 06:43:40,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1507660.0, ans=0.07 2024-08-12 06:43:55,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5850, loss[loss=0.112, beats_loss=0.012, ecapa_loss=0.0001729, whisper_loss=0.09825, over 22976.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01113, ecapa_loss=0.0001802, whisper_loss=0.09211, over 3909108.26 frames. ], batch size: 93, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:44:06,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1507760.0, ans=0.2 2024-08-12 06:44:07,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1507760.0, ans=0.1 2024-08-12 06:44:15,788 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 06:44:21,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1507860.0, ans=0.125 2024-08-12 06:44:24,079 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 06:44:25,409 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 06:44:28,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1507960.0, ans=0.015 2024-08-12 06:44:32,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-12 06:44:33,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-12 06:44:36,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1507960.0, ans=0.125 2024-08-12 06:44:55,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.480e+01 2.777e+01 3.215e+01 5.489e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 06:44:57,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1508160.0, ans=15.0 2024-08-12 06:45:02,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-12 06:45:06,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5900, loss[loss=0.09301, beats_loss=0.01074, ecapa_loss=0.0001697, whisper_loss=0.08058, over 20626.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01106, ecapa_loss=0.0001812, whisper_loss=0.09237, over 3893735.98 frames. ], batch size: 82, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:45:27,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.68 vs. limit=15.0 2024-08-12 06:45:31,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1508360.0, ans=0.125 2024-08-12 06:45:59,384 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 06:46:10,250 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 06:46:16,887 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 5950, loss[loss=0.1151, beats_loss=0.01047, ecapa_loss=0.0001933, whisper_loss=0.1027, over 23034.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01117, ecapa_loss=0.0001819, whisper_loss=0.09176, over 3883360.49 frames. ], batch size: 94, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:46:17,469 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:46:54,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1508960.0, ans=0.1 2024-08-12 06:47:15,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.882e+01 3.325e+01 5.467e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 06:47:15,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1509160.0, ans=0.2 2024-08-12 06:47:16,823 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 06:47:26,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6000, loss[loss=0.09469, beats_loss=0.01287, ecapa_loss=0.0001589, whisper_loss=0.08023, over 22927.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01122, ecapa_loss=0.0001819, whisper_loss=0.09089, over 3906719.53 frames. ], batch size: 93, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:47:26,638 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 06:48:09,280 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000598, whisper_loss=0.2484, over 922467.00 frames. 2024-08-12 06:48:27,586 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on SV_voxceleb1: loss=0.004893, beats_loss=0, ecapa_loss=0.0004893, whisper_loss=0, over 939242.00 frames. 2024-08-12 06:50:30,768 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on AT_audioset: loss=0.02461, beats_loss=0.02461, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 06:50:30,771 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 06:50:35,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1509260.0, ans=0.125 2024-08-12 06:51:07,288 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 06:51:12,888 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 06:51:17,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.02 vs. limit=10.0 2024-08-12 06:51:31,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-12 06:51:37,958 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 06:51:41,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6050, loss[loss=0.1063, beats_loss=0.01187, ecapa_loss=0.0001939, whisper_loss=0.09249, over 22187.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01116, ecapa_loss=0.0001811, whisper_loss=0.09176, over 3916627.58 frames. ], batch size: 95, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:51:42,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1509760.0, ans=0.125 2024-08-12 06:52:02,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1509860.0, ans=0.0 2024-08-12 06:52:02,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1509860.0, ans=0.125 2024-08-12 06:52:13,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1509960.0, ans=0.0 2024-08-12 06:52:25,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1510060.0, ans=0.1 2024-08-12 06:52:39,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.536e+01 2.768e+01 3.094e+01 4.494e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 06:52:50,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6100, loss[loss=0.1123, beats_loss=0.009954, ecapa_loss=0.0001631, whisper_loss=0.1008, over 22799.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01114, ecapa_loss=0.0001816, whisper_loss=0.09177, over 3943615.95 frames. ], batch size: 88, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:52:55,189 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 06:53:07,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1510360.0, ans=0.0 2024-08-12 06:53:16,962 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 06:53:17,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-12 06:53:22,372 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 06:53:50,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1510660.0, ans=0.125 2024-08-12 06:53:56,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1510660.0, ans=0.07 2024-08-12 06:54:00,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6150, loss[loss=0.1158, beats_loss=0.01089, ecapa_loss=0.0001817, whisper_loss=0.1031, over 20785.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01117, ecapa_loss=0.0001812, whisper_loss=0.09106, over 3932756.69 frames. ], batch size: 84, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:54:21,842 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 06:54:23,165 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 06:54:39,296 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 06:54:57,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.648e+01 2.944e+01 3.398e+01 5.258e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 06:55:08,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6200, loss[loss=0.1033, beats_loss=0.0113, ecapa_loss=0.000171, whisper_loss=0.09028, over 22144.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001818, whisper_loss=0.09185, over 3913580.20 frames. ], batch size: 88, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:55:30,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1511360.0, ans=0.125 2024-08-12 06:55:37,665 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 06:55:37,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1511460.0, ans=0.1 2024-08-12 06:55:38,975 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 06:55:40,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1511460.0, ans=0.2 2024-08-12 06:55:45,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1511460.0, ans=0.125 2024-08-12 06:56:00,763 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 06:56:09,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1511660.0, ans=0.0 2024-08-12 06:56:11,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1511660.0, ans=0.0 2024-08-12 06:56:17,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6250, loss[loss=0.08275, beats_loss=0.01405, ecapa_loss=0.0001614, whisper_loss=0.06709, over 19184.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01115, ecapa_loss=0.0001821, whisper_loss=0.09098, over 3898274.88 frames. ], batch size: 81, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:56:36,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1511860.0, ans=0.2 2024-08-12 06:57:07,113 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 06:57:16,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.481e+01 2.801e+01 3.369e+01 5.530e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 06:57:26,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1512160.0, ans=0.025 2024-08-12 06:57:28,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6300, loss[loss=0.1151, beats_loss=0.00965, ecapa_loss=0.0001728, whisper_loss=0.1037, over 16423.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0111, ecapa_loss=0.0001821, whisper_loss=0.09179, over 3892672.85 frames. ], batch size: 63, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:57:32,475 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 06:57:36,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1512260.0, ans=0.125 2024-08-12 06:57:36,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1512260.0, ans=0.0 2024-08-12 06:57:54,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1512360.0, ans=0.125 2024-08-12 06:58:06,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-08-12 06:58:08,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1512460.0, ans=0.07 2024-08-12 06:58:20,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1512560.0, ans=0.2 2024-08-12 06:58:26,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1512660.0, ans=0.125 2024-08-12 06:58:27,111 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 06:58:38,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-12 06:58:40,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6350, loss[loss=0.09176, beats_loss=0.01464, ecapa_loss=0.0002267, whisper_loss=0.07485, over 20887.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.000183, whisper_loss=0.09207, over 3869749.16 frames. ], batch size: 94, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:59:41,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1513160.0, ans=0.1 2024-08-12 06:59:42,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.576e+01 2.857e+01 3.160e+01 6.267e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 06:59:51,230 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 06:59:53,757 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6400, loss[loss=0.09815, beats_loss=0.01084, ecapa_loss=0.0001589, whisper_loss=0.08571, over 18652.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001822, whisper_loss=0.09189, over 3857030.26 frames. ], batch size: 70, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:59:54,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1513260.0, ans=0.125 2024-08-12 07:00:07,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1513360.0, ans=0.0 2024-08-12 07:00:21,581 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 07:00:52,802 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-12 07:01:02,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2024-08-12 07:01:07,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6450, loss[loss=0.1056, beats_loss=0.01052, ecapa_loss=0.0001813, whisper_loss=0.09324, over 22424.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.000183, whisper_loss=0.09242, over 3852031.53 frames. ], batch size: 92, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:01:29,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1513860.0, ans=0.1 2024-08-12 07:01:34,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2024-08-12 07:01:36,955 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 07:01:59,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1514060.0, ans=0.125 2024-08-12 07:02:09,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2024-08-12 07:02:10,990 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.620e+01 2.898e+01 3.369e+01 4.608e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-12 07:02:22,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6500, loss[loss=0.1149, beats_loss=0.01056, ecapa_loss=0.0001669, whisper_loss=0.1026, over 23808.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01101, ecapa_loss=0.0001818, whisper_loss=0.0937, over 3849071.88 frames. ], batch size: 90, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:02:55,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1514460.0, ans=0.0 2024-08-12 07:03:08,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1514560.0, ans=0.1 2024-08-12 07:03:15,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1514560.0, ans=0.2 2024-08-12 07:03:15,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1514560.0, ans=0.0 2024-08-12 07:03:24,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1514660.0, ans=0.125 2024-08-12 07:03:24,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2024-08-12 07:03:37,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6550, loss[loss=0.0808, beats_loss=0.01246, ecapa_loss=0.0001682, whisper_loss=0.06666, over 21603.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01109, ecapa_loss=0.0001816, whisper_loss=0.09278, over 3875840.42 frames. ], batch size: 90, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:03:39,620 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 07:03:50,114 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 07:03:50,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1514860.0, ans=0.125 2024-08-12 07:03:57,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1514860.0, ans=0.05 2024-08-12 07:04:13,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1514960.0, ans=0.05 2024-08-12 07:04:16,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1514960.0, ans=0.1 2024-08-12 07:04:22,893 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 07:04:28,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1515060.0, ans=0.125 2024-08-12 07:04:31,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1515060.0, ans=0.125 2024-08-12 07:04:43,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1515160.0, ans=0.125 2024-08-12 07:04:47,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.605e+01 2.821e+01 3.389e+01 5.277e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 07:04:47,860 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 07:04:55,461 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-12 07:05:02,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6600, loss[loss=0.1069, beats_loss=0.01199, ecapa_loss=0.0001452, whisper_loss=0.09348, over 23670.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001831, whisper_loss=0.09237, over 3892076.06 frames. ], batch size: 93, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:05:13,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1515260.0, ans=0.125 2024-08-12 07:05:18,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1515360.0, ans=0.035 2024-08-12 07:05:18,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1515360.0, ans=0.0 2024-08-12 07:05:29,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1515360.0, ans=0.125 2024-08-12 07:05:35,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1515360.0, ans=0.125 2024-08-12 07:06:08,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2024-08-12 07:06:15,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1515660.0, ans=0.2 2024-08-12 07:06:22,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6650, loss[loss=0.08784, beats_loss=0.01049, ecapa_loss=0.000214, whisper_loss=0.0752, over 16082.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01106, ecapa_loss=0.0001836, whisper_loss=0.09272, over 3919694.40 frames. ], batch size: 65, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:06:26,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1515760.0, ans=0.0 2024-08-12 07:06:56,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2024-08-12 07:07:18,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2024-08-12 07:07:38,064 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 07:07:42,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.66 vs. limit=10.0 2024-08-12 07:07:44,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1516160.0, ans=0.125 2024-08-12 07:07:48,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.716e+01 3.038e+01 3.399e+01 5.348e+01, threshold=6.076e+01, percent-clipped=0.0 2024-08-12 07:08:06,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6700, loss[loss=0.1081, beats_loss=0.01204, ecapa_loss=0.0001553, whisper_loss=0.09455, over 22066.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01109, ecapa_loss=0.0001808, whisper_loss=0.09272, over 3923901.69 frames. ], batch size: 89, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:08:27,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1516360.0, ans=0.1 2024-08-12 07:08:57,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1516460.0, ans=0.125 2024-08-12 07:09:29,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1516660.0, ans=0.125 2024-08-12 07:09:36,873 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 07:09:43,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6750, loss[loss=0.1207, beats_loss=0.008868, ecapa_loss=0.0001798, whisper_loss=0.11, over 20495.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01103, ecapa_loss=0.0001815, whisper_loss=0.09292, over 3891237.00 frames. ], batch size: 78, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:09:49,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1516760.0, ans=6.0 2024-08-12 07:09:54,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2024-08-12 07:09:56,964 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 07:10:08,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1516860.0, ans=0.0 2024-08-12 07:10:20,652 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 07:10:25,373 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 07:10:38,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1516960.0, ans=0.125 2024-08-12 07:10:50,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1517060.0, ans=0.2 2024-08-12 07:11:07,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.593e+01 2.755e+01 3.178e+01 4.521e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-12 07:11:12,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1517160.0, ans=0.125 2024-08-12 07:11:15,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-12 07:11:24,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1517260.0, ans=0.0 2024-08-12 07:11:24,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-08-12 07:11:24,982 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6800, loss[loss=0.1077, beats_loss=0.009744, ecapa_loss=0.0001677, whisper_loss=0.0963, over 15525.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01103, ecapa_loss=0.0001805, whisper_loss=0.09289, over 3906817.65 frames. ], batch size: 60, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:11:29,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2024-08-12 07:11:45,539 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 07:12:02,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-08-12 07:12:14,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.89 vs. limit=10.0 2024-08-12 07:12:16,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.55 vs. limit=10.0 2024-08-12 07:12:19,136 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 07:12:23,372 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 07:12:35,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1517660.0, ans=0.125 2024-08-12 07:12:41,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6850, loss[loss=0.1256, beats_loss=0.007411, ecapa_loss=0.0002152, whisper_loss=0.1161, over 15681.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01102, ecapa_loss=0.00018, whisper_loss=0.09271, over 3871199.63 frames. ], batch size: 62, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:12:43,651 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.594e-03 2024-08-12 07:13:00,421 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 07:13:09,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1517960.0, ans=0.125 2024-08-12 07:13:13,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1517960.0, ans=0.125 2024-08-12 07:13:19,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-12 07:13:30,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1518060.0, ans=0.125 2024-08-12 07:13:32,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1518060.0, ans=0.125 2024-08-12 07:13:33,872 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 07:13:37,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1518060.0, ans=0.0 2024-08-12 07:13:42,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.608e+01 2.859e+01 3.356e+01 1.905e+02, threshold=5.718e+01, percent-clipped=1.0 2024-08-12 07:13:48,929 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 07:13:53,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6900, loss[loss=0.1222, beats_loss=0.009804, ecapa_loss=0.0001396, whisper_loss=0.111, over 24265.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01104, ecapa_loss=0.0001818, whisper_loss=0.09206, over 3853934.31 frames. ], batch size: 89, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:14:06,372 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 07:14:38,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1518560.0, ans=0.125 2024-08-12 07:15:04,668 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 07:15:05,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 6950, loss[loss=0.08638, beats_loss=0.01412, ecapa_loss=0.0001425, whisper_loss=0.07084, over 20367.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01113, ecapa_loss=0.0001809, whisper_loss=0.09148, over 3858981.21 frames. ], batch size: 82, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:15:17,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1518760.0, ans=0.1 2024-08-12 07:15:43,372 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-12 07:15:49,021 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 07:15:49,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-08-12 07:15:49,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2024-08-12 07:15:50,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1519060.0, ans=0.0 2024-08-12 07:15:53,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1519060.0, ans=0.125 2024-08-12 07:15:57,137 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 07:15:58,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1519060.0, ans=0.125 2024-08-12 07:16:01,227 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 07:16:04,115 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.463e+01 2.859e+01 3.118e+01 2.003e+02, threshold=5.718e+01, percent-clipped=2.0 2024-08-12 07:16:06,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1519160.0, ans=0.0 2024-08-12 07:16:14,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7000, loss[loss=0.09924, beats_loss=0.008971, ecapa_loss=0.0002202, whisper_loss=0.08807, over 16174.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01108, ecapa_loss=0.0001819, whisper_loss=0.09225, over 3854376.12 frames. ], batch size: 64, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:17:13,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1519660.0, ans=0.0 2024-08-12 07:17:25,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7050, loss[loss=0.1125, beats_loss=0.01127, ecapa_loss=0.0001957, whisper_loss=0.09929, over 16441.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01107, ecapa_loss=0.0001819, whisper_loss=0.09233, over 3831880.46 frames. ], batch size: 67, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:17:38,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1519860.0, ans=0.125 2024-08-12 07:17:39,721 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 07:17:51,406 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 07:17:53,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1519960.0, ans=0.2 2024-08-12 07:18:08,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-12 07:18:19,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1520060.0, ans=0.125 2024-08-12 07:18:25,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=22.5 2024-08-12 07:18:28,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.499e+01 2.770e+01 3.110e+01 4.662e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 07:18:35,228 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:18:35,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1520160.0, ans=0.125 2024-08-12 07:18:40,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7100, loss[loss=0.09533, beats_loss=0.01317, ecapa_loss=0.000183, whisper_loss=0.08033, over 21716.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01104, ecapa_loss=0.000181, whisper_loss=0.09223, over 3827628.60 frames. ], batch size: 90, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:18:41,902 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 07:18:49,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1520260.0, ans=0.0 2024-08-12 07:18:53,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1520360.0, ans=0.09899494936611666 2024-08-12 07:18:54,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1520360.0, ans=0.1 2024-08-12 07:18:56,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-12 07:19:04,672 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 07:19:47,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1520660.0, ans=0.125 2024-08-12 07:19:54,506 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7150, loss[loss=0.123, beats_loss=0.00917, ecapa_loss=0.0001963, whisper_loss=0.1119, over 19344.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0111, ecapa_loss=0.0001819, whisper_loss=0.09258, over 3852266.64 frames. ], batch size: 76, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:20:10,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1520860.0, ans=0.0 2024-08-12 07:20:18,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1520860.0, ans=0.2 2024-08-12 07:20:25,786 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 07:20:41,438 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-12 07:20:47,403 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-12 07:20:54,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1521160.0, ans=0.125 2024-08-12 07:20:55,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.570e+01 2.914e+01 3.125e+01 1.770e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 07:20:55,932 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 07:21:07,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7200, loss[loss=0.1182, beats_loss=0.01013, ecapa_loss=0.0001615, whisper_loss=0.1064, over 23276.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01119, ecapa_loss=0.0001816, whisper_loss=0.09212, over 3854564.99 frames. ], batch size: 91, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:21:42,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.78 vs. limit=10.0 2024-08-12 07:21:58,532 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 07:22:00,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1521560.0, ans=0.2 2024-08-12 07:22:12,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1521660.0, ans=0.07 2024-08-12 07:22:22,196 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7250, loss[loss=0.1158, beats_loss=0.00849, ecapa_loss=0.000222, whisper_loss=0.105, over 21811.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01116, ecapa_loss=0.0001816, whisper_loss=0.092, over 3895435.09 frames. ], batch size: 88, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:22:33,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1521760.0, ans=0.04949747468305833 2024-08-12 07:22:58,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1521960.0, ans=0.025 2024-08-12 07:23:09,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1522060.0, ans=0.2 2024-08-12 07:23:15,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2024-08-12 07:23:21,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.03 vs. limit=22.5 2024-08-12 07:23:24,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.489e+01 2.804e+01 3.145e+01 4.718e+01, threshold=5.607e+01, percent-clipped=0.0 2024-08-12 07:23:36,419 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7300, loss[loss=0.113, beats_loss=0.01121, ecapa_loss=0.0001482, whisper_loss=0.1003, over 16497.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01115, ecapa_loss=0.0001811, whisper_loss=0.09261, over 3901144.83 frames. ], batch size: 64, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:23:37,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1522260.0, ans=0.0 2024-08-12 07:23:38,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1522260.0, ans=0.05 2024-08-12 07:23:51,079 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 07:23:58,320 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 07:24:07,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1522460.0, ans=0.125 2024-08-12 07:24:35,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1522660.0, ans=0.0 2024-08-12 07:24:42,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-12 07:24:43,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1522660.0, ans=0.125 2024-08-12 07:24:49,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7350, loss[loss=0.1059, beats_loss=0.01209, ecapa_loss=0.0001545, whisper_loss=0.09228, over 23788.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001824, whisper_loss=0.09254, over 3895305.39 frames. ], batch size: 94, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:24:51,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1522760.0, ans=0.125 2024-08-12 07:24:58,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1522760.0, ans=0.035 2024-08-12 07:25:04,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1522860.0, ans=0.1 2024-08-12 07:25:33,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1523060.0, ans=0.125 2024-08-12 07:25:36,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1523060.0, ans=0.125 2024-08-12 07:25:37,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.17 vs. limit=10.0 2024-08-12 07:25:51,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1523160.0, ans=0.5 2024-08-12 07:25:52,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.567e+01 3.033e+01 3.476e+01 4.624e+01, threshold=6.066e+01, percent-clipped=0.0 2024-08-12 07:25:59,711 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 07:26:03,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7400, loss[loss=0.0701, beats_loss=0.01113, ecapa_loss=0.0002225, whisper_loss=0.05675, over 15347.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001815, whisper_loss=0.09302, over 3903403.24 frames. ], batch size: 65, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:26:46,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1523560.0, ans=0.1 2024-08-12 07:26:49,169 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 07:26:52,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-12 07:26:53,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1523560.0, ans=0.05 2024-08-12 07:27:03,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1523660.0, ans=0.125 2024-08-12 07:27:03,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1523660.0, ans=0.0 2024-08-12 07:27:10,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1523660.0, ans=10.0 2024-08-12 07:27:17,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1523760.0, ans=0.125 2024-08-12 07:27:18,236 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7450, loss[loss=0.09157, beats_loss=0.01029, ecapa_loss=0.0002021, whisper_loss=0.07926, over 16380.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01112, ecapa_loss=0.0001807, whisper_loss=0.09313, over 3878871.30 frames. ], batch size: 68, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:27:20,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1523760.0, ans=0.0 2024-08-12 07:27:22,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-12 07:27:26,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1523760.0, ans=0.0 2024-08-12 07:27:30,781 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 31 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 07:27:41,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1523860.0, ans=0.0 2024-08-12 07:28:12,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1524060.0, ans=0.2 2024-08-12 07:28:17,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2024-08-12 07:28:20,893 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.662e+01 2.945e+01 3.324e+01 4.940e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 07:28:26,463 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 07:28:31,843 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7500, loss[loss=0.1044, beats_loss=0.01341, ecapa_loss=0.0001411, whisper_loss=0.08959, over 16043.00 frames. ], tot_loss[loss=0.106, beats_loss=0.011, ecapa_loss=0.0001819, whisper_loss=0.0932, over 3835892.33 frames. ], batch size: 64, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:28:37,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-12 07:28:44,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1524260.0, ans=0.125 2024-08-12 07:28:46,732 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 07:28:49,865 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 07:28:53,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2024-08-12 07:28:54,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-12 07:29:02,384 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 07:29:23,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-08-12 07:29:26,635 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 07:29:39,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1524660.0, ans=0.125 2024-08-12 07:29:43,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7550, loss[loss=0.1084, beats_loss=0.01339, ecapa_loss=0.0001495, whisper_loss=0.09347, over 20945.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01102, ecapa_loss=0.0001819, whisper_loss=0.09267, over 3844186.84 frames. ], batch size: 85, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:29:53,825 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 07:30:01,107 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-12 07:30:05,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1524860.0, ans=0.07 2024-08-12 07:30:16,987 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 07:30:25,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1524960.0, ans=0.04949747468305833 2024-08-12 07:30:29,598 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 07:30:44,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1525160.0, ans=0.1 2024-08-12 07:30:46,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.510e+01 2.746e+01 3.098e+01 2.240e+02, threshold=5.492e+01, percent-clipped=2.0 2024-08-12 07:30:51,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1525160.0, ans=0.125 2024-08-12 07:30:58,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7600, loss[loss=0.1094, beats_loss=0.00908, ecapa_loss=0.000202, whisper_loss=0.09833, over 22988.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001819, whisper_loss=0.09257, over 3864619.98 frames. ], batch size: 91, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:31:00,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1525260.0, ans=0.125 2024-08-12 07:31:16,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-12 07:31:28,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1525460.0, ans=0.125 2024-08-12 07:31:32,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1525460.0, ans=0.1 2024-08-12 07:31:42,971 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 07:31:49,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1525560.0, ans=0.1 2024-08-12 07:31:52,216 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 07:31:52,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1525560.0, ans=0.0 2024-08-12 07:31:53,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1525560.0, ans=0.125 2024-08-12 07:32:02,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1525660.0, ans=0.0 2024-08-12 07:32:05,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1525660.0, ans=0.125 2024-08-12 07:32:12,176 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7650, loss[loss=0.1048, beats_loss=0.01072, ecapa_loss=0.0001623, whisper_loss=0.09247, over 19042.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001802, whisper_loss=0.09155, over 3847162.00 frames. ], batch size: 73, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:32:36,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1525860.0, ans=0.07 2024-08-12 07:32:37,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1525860.0, ans=0.125 2024-08-12 07:32:40,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1525960.0, ans=0.0 2024-08-12 07:32:53,180 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-12 07:32:54,787 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-12 07:32:58,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1526060.0, ans=0.2 2024-08-12 07:33:13,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.517e+01 2.819e+01 3.143e+01 1.705e+02, threshold=5.638e+01, percent-clipped=1.0 2024-08-12 07:33:15,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1526160.0, ans=0.125 2024-08-12 07:33:15,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2024-08-12 07:33:20,755 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 07:33:23,912 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 07:33:25,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7700, loss[loss=0.104, beats_loss=0.01108, ecapa_loss=0.0001816, whisper_loss=0.0911, over 17566.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.0001803, whisper_loss=0.09176, over 3846145.81 frames. ], batch size: 69, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:33:43,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1526360.0, ans=0.0 2024-08-12 07:33:45,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2024-08-12 07:34:05,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1526460.0, ans=0.125 2024-08-12 07:34:30,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1526660.0, ans=0.125 2024-08-12 07:34:42,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7750, loss[loss=0.08644, beats_loss=0.01476, ecapa_loss=0.0001207, whisper_loss=0.07047, over 19628.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001788, whisper_loss=0.0921, over 3845764.09 frames. ], batch size: 79, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:34:49,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1526760.0, ans=0.125 2024-08-12 07:34:49,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-12 07:34:59,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1526860.0, ans=0.0 2024-08-12 07:35:03,175 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 07:35:15,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1526960.0, ans=0.2 2024-08-12 07:35:42,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1527160.0, ans=0.1 2024-08-12 07:35:44,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.468e+01 2.726e+01 3.182e+01 4.341e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 07:35:56,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7800, loss[loss=0.07908, beats_loss=0.01269, ecapa_loss=0.0001645, whisper_loss=0.06474, over 19934.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01097, ecapa_loss=0.0001784, whisper_loss=0.09202, over 3832734.21 frames. ], batch size: 81, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:36:16,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1527360.0, ans=0.125 2024-08-12 07:36:16,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1527360.0, ans=0.125 2024-08-12 07:36:18,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1527360.0, ans=0.0 2024-08-12 07:36:21,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2024-08-12 07:36:26,282 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 07:36:33,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2024-08-12 07:36:38,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1527460.0, ans=0.0 2024-08-12 07:36:52,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-08-12 07:37:02,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1527660.0, ans=0.125 2024-08-12 07:37:02,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1527660.0, ans=0.0 2024-08-12 07:37:09,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7850, loss[loss=0.09729, beats_loss=0.01017, ecapa_loss=0.0001715, whisper_loss=0.0854, over 14057.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001781, whisper_loss=0.0924, over 3854326.89 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:37:14,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1527760.0, ans=0.0 2024-08-12 07:37:33,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1527860.0, ans=0.2 2024-08-12 07:37:35,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2024-08-12 07:37:40,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1527960.0, ans=0.0 2024-08-12 07:37:46,181 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 07:37:52,751 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 07:37:55,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1528060.0, ans=0.0 2024-08-12 07:38:03,239 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 07:38:06,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1528060.0, ans=0.0 2024-08-12 07:38:10,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1528160.0, ans=0.125 2024-08-12 07:38:13,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.514e+01 2.915e+01 3.388e+01 6.482e+01, threshold=5.829e+01, percent-clipped=1.0 2024-08-12 07:38:25,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7900, loss[loss=0.1174, beats_loss=0.009925, ecapa_loss=0.0001957, whisper_loss=0.1055, over 17254.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001783, whisper_loss=0.09212, over 3854877.79 frames. ], batch size: 68, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:38:30,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.52 vs. limit=22.5 2024-08-12 07:38:33,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1528260.0, ans=0.2 2024-08-12 07:38:33,929 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 07:38:35,210 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 07:38:43,237 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:38:57,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1528460.0, ans=0.1 2024-08-12 07:39:00,378 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 07:39:25,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1528660.0, ans=0.125 2024-08-12 07:39:37,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 7950, loss[loss=0.09834, beats_loss=0.01174, ecapa_loss=0.0002481, whisper_loss=0.08412, over 22021.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01111, ecapa_loss=0.0001779, whisper_loss=0.09212, over 3871601.78 frames. ], batch size: 93, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:39:49,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1528760.0, ans=0.1 2024-08-12 07:39:54,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1528860.0, ans=0.125 2024-08-12 07:39:57,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1528860.0, ans=0.125 2024-08-12 07:40:06,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1528960.0, ans=0.0 2024-08-12 07:40:07,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1528960.0, ans=0.125 2024-08-12 07:40:12,496 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.476e-01 2024-08-12 07:40:17,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1528960.0, ans=0.125 2024-08-12 07:40:31,473 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 07:40:40,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.548e+01 3.030e+01 3.373e+01 4.598e+01, threshold=6.060e+01, percent-clipped=0.0 2024-08-12 07:40:41,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1529160.0, ans=0.1 2024-08-12 07:40:52,154 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8000, loss[loss=0.1001, beats_loss=0.01247, ecapa_loss=0.0001588, whisper_loss=0.08604, over 21290.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01114, ecapa_loss=0.0001781, whisper_loss=0.09197, over 3844326.07 frames. ], batch size: 87, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:40:55,722 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 07:40:57,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1529260.0, ans=0.0 2024-08-12 07:41:12,653 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 07:41:17,042 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 07:41:20,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1529360.0, ans=0.0 2024-08-12 07:41:24,851 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 07:41:32,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1529460.0, ans=0.1 2024-08-12 07:41:35,031 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 07:41:41,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1529560.0, ans=10.0 2024-08-12 07:41:45,318 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 07:41:57,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1529660.0, ans=0.125 2024-08-12 07:41:59,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1529660.0, ans=0.125 2024-08-12 07:42:02,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2024-08-12 07:42:07,780 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8050, loss[loss=0.1007, beats_loss=0.01173, ecapa_loss=0.0001529, whisper_loss=0.08746, over 22869.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01111, ecapa_loss=0.0001784, whisper_loss=0.09251, over 3865442.86 frames. ], batch size: 91, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:42:15,267 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 07:42:24,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1529860.0, ans=0.125 2024-08-12 07:42:30,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1529860.0, ans=0.125 2024-08-12 07:42:34,078 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 07:42:46,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1529960.0, ans=0.1 2024-08-12 07:43:00,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1530060.0, ans=0.125 2024-08-12 07:43:00,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1530060.0, ans=0.125 2024-08-12 07:43:08,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.431e+01 2.661e+01 3.080e+01 6.684e+01, threshold=5.323e+01, percent-clipped=1.0 2024-08-12 07:43:21,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8100, loss[loss=0.1204, beats_loss=0.01021, ecapa_loss=0.000159, whisper_loss=0.1086, over 19897.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.0001791, whisper_loss=0.09158, over 3830032.78 frames. ], batch size: 76, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:43:25,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1530260.0, ans=0.125 2024-08-12 07:43:31,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1530260.0, ans=0.0 2024-08-12 07:43:40,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.05 vs. limit=15.0 2024-08-12 07:43:45,629 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 07:43:59,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1530460.0, ans=0.2 2024-08-12 07:44:04,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1530460.0, ans=0.125 2024-08-12 07:44:11,156 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 07:44:16,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1530560.0, ans=0.1 2024-08-12 07:44:19,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1530560.0, ans=0.125 2024-08-12 07:44:19,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1530560.0, ans=0.125 2024-08-12 07:44:25,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1530660.0, ans=0.09899494936611666 2024-08-12 07:44:28,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1530660.0, ans=0.1 2024-08-12 07:44:36,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-08-12 07:44:37,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8150, loss[loss=0.1074, beats_loss=0.01327, ecapa_loss=0.000137, whisper_loss=0.09271, over 23383.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01106, ecapa_loss=0.0001786, whisper_loss=0.09194, over 3825681.73 frames. ], batch size: 90, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:44:44,796 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 07:45:06,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1530960.0, ans=0.1 2024-08-12 07:45:14,484 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 07:45:16,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1530960.0, ans=0.2 2024-08-12 07:45:20,445 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 07:45:25,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1531060.0, ans=0.125 2024-08-12 07:45:27,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1531060.0, ans=0.125 2024-08-12 07:45:38,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.559e+01 2.858e+01 3.192e+01 6.698e+01, threshold=5.715e+01, percent-clipped=1.0 2024-08-12 07:45:50,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8200, loss[loss=0.1075, beats_loss=0.01069, ecapa_loss=0.0001895, whisper_loss=0.09491, over 23221.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01106, ecapa_loss=0.0001792, whisper_loss=0.09203, over 3856685.91 frames. ], batch size: 92, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:45:53,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1531260.0, ans=0.125 2024-08-12 07:46:23,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1531460.0, ans=0.125 2024-08-12 07:46:26,001 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 07:46:29,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1531460.0, ans=0.125 2024-08-12 07:46:54,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1531660.0, ans=0.125 2024-08-12 07:47:01,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8250, loss[loss=0.09604, beats_loss=0.0108, ecapa_loss=0.0001788, whisper_loss=0.08345, over 23056.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01118, ecapa_loss=0.0001774, whisper_loss=0.09175, over 3882592.86 frames. ], batch size: 92, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:47:29,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1531860.0, ans=0.2 2024-08-12 07:47:46,027 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 07:47:47,446 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 07:47:55,401 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 07:47:55,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1532060.0, ans=0.5 2024-08-12 07:48:03,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.593e+01 2.850e+01 3.386e+01 5.334e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-12 07:48:14,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8300, loss[loss=0.1055, beats_loss=0.01065, ecapa_loss=0.0001585, whisper_loss=0.09329, over 17023.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001767, whisper_loss=0.09193, over 3865522.29 frames. ], batch size: 65, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:49:06,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-12 07:49:23,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8350, loss[loss=0.09736, beats_loss=0.008175, ecapa_loss=0.0002381, whisper_loss=0.0868, over 16563.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.0001786, whisper_loss=0.09225, over 3874059.80 frames. ], batch size: 71, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:49:23,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1532760.0, ans=0.125 2024-08-12 07:49:28,468 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-12 07:49:52,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1532960.0, ans=0.2 2024-08-12 07:49:57,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1532960.0, ans=0.0 2024-08-12 07:50:00,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1532960.0, ans=0.1 2024-08-12 07:50:11,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1533060.0, ans=0.125 2024-08-12 07:50:20,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2024-08-12 07:50:22,525 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 07:50:23,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.502e+01 2.920e+01 3.300e+01 7.763e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-12 07:50:28,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1533160.0, ans=0.07 2024-08-12 07:50:33,513 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8400, loss[loss=0.08804, beats_loss=0.009945, ecapa_loss=0.0001983, whisper_loss=0.07611, over 19513.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01102, ecapa_loss=0.0001794, whisper_loss=0.09304, over 3879299.97 frames. ], batch size: 85, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:50:46,841 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 07:50:50,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1533360.0, ans=0.125 2024-08-12 07:50:50,228 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.273e+05 2024-08-12 07:50:58,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1533360.0, ans=0.1 2024-08-12 07:51:00,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1533360.0, ans=0.025 2024-08-12 07:51:24,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2024-08-12 07:51:34,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1533660.0, ans=10.0 2024-08-12 07:51:34,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1533660.0, ans=0.2 2024-08-12 07:51:45,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8450, loss[loss=0.08657, beats_loss=0.009737, ecapa_loss=0.0002016, whisper_loss=0.07482, over 18237.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01101, ecapa_loss=0.0001791, whisper_loss=0.09295, over 3876235.59 frames. ], batch size: 76, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:51:47,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1533760.0, ans=15.0 2024-08-12 07:51:48,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1533760.0, ans=0.0 2024-08-12 07:51:52,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1533760.0, ans=0.125 2024-08-12 07:51:54,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1533760.0, ans=0.125 2024-08-12 07:52:02,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1533860.0, ans=0.125 2024-08-12 07:52:25,713 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 07:52:32,584 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 07:52:34,031 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-12 07:52:46,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1534160.0, ans=0.125 2024-08-12 07:52:46,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.460e+01 2.717e+01 3.180e+01 4.918e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 07:52:53,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1534160.0, ans=0.0 2024-08-12 07:52:56,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8500, loss[loss=0.09562, beats_loss=0.01099, ecapa_loss=0.0001534, whisper_loss=0.08309, over 21394.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01105, ecapa_loss=0.0001793, whisper_loss=0.09277, over 3890690.33 frames. ], batch size: 83, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:52:59,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1534260.0, ans=0.125 2024-08-12 07:53:05,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1534260.0, ans=0.1 2024-08-12 07:53:06,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1534260.0, ans=0.125 2024-08-12 07:53:17,968 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 07:53:20,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-08-12 07:53:33,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1534460.0, ans=0.0 2024-08-12 07:53:44,888 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 07:53:58,030 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 07:54:06,279 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 07:54:07,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8550, loss[loss=0.109, beats_loss=0.01202, ecapa_loss=0.0001448, whisper_loss=0.09555, over 21871.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001797, whisper_loss=0.09289, over 3870564.88 frames. ], batch size: 83, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:54:16,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2024-08-12 07:54:25,001 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 07:54:29,159 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 07:54:55,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1535060.0, ans=0.125 2024-08-12 07:55:00,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1535060.0, ans=0.5 2024-08-12 07:55:09,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.569e+01 2.940e+01 3.392e+01 6.119e+01, threshold=5.880e+01, percent-clipped=2.0 2024-08-12 07:55:09,700 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 17 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 07:55:19,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8600, loss[loss=0.1062, beats_loss=0.01137, ecapa_loss=0.0001914, whisper_loss=0.09291, over 22003.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01107, ecapa_loss=0.0001804, whisper_loss=0.09301, over 3864183.10 frames. ], batch size: 90, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:55:21,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1535260.0, ans=0.1 2024-08-12 07:55:30,666 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 07:55:45,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1535360.0, ans=0.125 2024-08-12 07:55:48,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1535460.0, ans=0.0 2024-08-12 07:56:05,252 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 07:56:14,982 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-12 07:56:28,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1535660.0, ans=0.125 2024-08-12 07:56:30,541 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 07:56:31,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8650, loss[loss=0.09291, beats_loss=0.01043, ecapa_loss=0.0002238, whisper_loss=0.08025, over 13395.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01113, ecapa_loss=0.0001801, whisper_loss=0.09263, over 3863910.40 frames. ], batch size: 56, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:56:31,968 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 07:56:39,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1535760.0, ans=0.1 2024-08-12 07:56:43,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1535760.0, ans=0.0 2024-08-12 07:56:46,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1535860.0, ans=0.015 2024-08-12 07:56:53,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1535860.0, ans=0.0 2024-08-12 07:56:59,887 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 21 from LS+wenet, 19 from Vox, 56 fro AS 2024-08-12 07:57:02,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1535960.0, ans=0.07 2024-08-12 07:57:10,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2024-08-12 07:57:10,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2024-08-12 07:57:20,161 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 07:57:20,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1536060.0, ans=0.0 2024-08-12 07:57:25,016 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 07:57:34,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.601e+01 2.885e+01 3.263e+01 5.509e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-12 07:57:35,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1536160.0, ans=0.1 2024-08-12 07:57:45,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8700, loss[loss=0.1199, beats_loss=0.009859, ecapa_loss=0.000174, whisper_loss=0.1083, over 19056.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01116, ecapa_loss=0.0001794, whisper_loss=0.09243, over 3878135.32 frames. ], batch size: 77, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:57:48,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1536260.0, ans=0.125 2024-08-12 07:58:16,077 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 07:58:32,365 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:58:42,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1536660.0, ans=0.0 2024-08-12 07:58:57,672 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8750, loss[loss=0.1118, beats_loss=0.01059, ecapa_loss=0.0001619, whisper_loss=0.09954, over 22672.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0111, ecapa_loss=0.0001794, whisper_loss=0.09238, over 3849348.63 frames. ], batch size: 91, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:58:57,834 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 07:58:59,146 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 07:59:14,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.30 vs. limit=10.0 2024-08-12 07:59:41,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2024-08-12 07:59:49,865 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 07:59:50,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1537060.0, ans=0.0 2024-08-12 07:59:59,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.459e+01 2.752e+01 3.200e+01 4.704e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 08:00:06,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2024-08-12 08:00:08,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1537260.0, ans=0.2 2024-08-12 08:00:09,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8800, loss[loss=0.09816, beats_loss=0.01223, ecapa_loss=0.0002007, whisper_loss=0.08392, over 20390.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01119, ecapa_loss=0.0001788, whisper_loss=0.09255, over 3867997.72 frames. ], batch size: 85, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:00:18,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2024-08-12 08:00:19,770 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-12 08:00:28,512 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 08:00:35,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1537360.0, ans=0.2 2024-08-12 08:00:59,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1537560.0, ans=0.0 2024-08-12 08:01:09,457 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 08:01:14,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1537660.0, ans=0.125 2024-08-12 08:01:15,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1537660.0, ans=0.125 2024-08-12 08:01:22,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8850, loss[loss=0.08858, beats_loss=0.009919, ecapa_loss=0.0002195, whisper_loss=0.07647, over 15127.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01122, ecapa_loss=0.0001763, whisper_loss=0.09262, over 3878353.23 frames. ], batch size: 61, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:01:24,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1537760.0, ans=0.125 2024-08-12 08:01:48,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1537860.0, ans=0.0 2024-08-12 08:01:58,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1537960.0, ans=0.125 2024-08-12 08:02:03,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1537960.0, ans=0.05 2024-08-12 08:02:16,527 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 08:02:18,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1538060.0, ans=0.1 2024-08-12 08:02:26,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.513e+01 2.817e+01 3.159e+01 3.465e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-12 08:02:26,989 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 30 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 08:02:30,288 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 08:02:33,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1538160.0, ans=0.0 2024-08-12 08:02:36,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8900, loss[loss=0.07806, beats_loss=0.01068, ecapa_loss=0.000211, whisper_loss=0.06527, over 12882.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01122, ecapa_loss=0.0001766, whisper_loss=0.09219, over 3836882.93 frames. ], batch size: 53, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:02:48,784 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.186e-02 2024-08-12 08:02:50,515 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:02:54,577 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 08:03:03,443 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 08:03:04,729 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-12 08:03:26,675 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 08:03:36,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1538660.0, ans=0.1 2024-08-12 08:03:40,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2024-08-12 08:03:46,318 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 08:03:48,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-08-12 08:03:50,463 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 8950, loss[loss=0.1259, beats_loss=0.01078, ecapa_loss=0.0001581, whisper_loss=0.1135, over 23246.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01113, ecapa_loss=0.0001774, whisper_loss=0.09247, over 3831478.15 frames. ], batch size: 88, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:04:18,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1538960.0, ans=0.125 2024-08-12 08:04:21,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-12 08:04:28,348 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 08:04:36,731 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 08:04:52,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.556e+01 2.825e+01 3.281e+01 7.768e+01, threshold=5.651e+01, percent-clipped=1.0 2024-08-12 08:04:57,168 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 38 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 08:04:59,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-12 08:05:02,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9000, loss[loss=0.0978, beats_loss=0.01186, ecapa_loss=0.0001551, whisper_loss=0.08439, over 20265.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001785, whisper_loss=0.09278, over 3855518.30 frames. ], batch size: 81, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:05:02,460 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 08:05:41,702 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006109, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 08:05:51,158 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1418, 3.3572, 3.4122, 3.0812], device='cuda:3') 2024-08-12 08:05:59,624 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on SV_voxceleb1: loss=0.004943, beats_loss=0, ecapa_loss=0.0004943, whisper_loss=0, over 939242.00 frames. 2024-08-12 08:07:52,951 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on AT_audioset: loss=0.02436, beats_loss=0.02436, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 08:07:53,019 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 08:07:56,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1539260.0, ans=0.0 2024-08-12 08:08:13,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1539360.0, ans=0.125 2024-08-12 08:08:22,165 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 08:08:43,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.97 vs. limit=10.0 2024-08-12 08:08:51,549 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 08:09:00,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1539660.0, ans=0.2 2024-08-12 08:09:05,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9050, loss[loss=0.101, beats_loss=0.0101, ecapa_loss=0.0001737, whisper_loss=0.08913, over 15671.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01114, ecapa_loss=0.0001783, whisper_loss=0.09245, over 3863235.24 frames. ], batch size: 59, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:09:10,749 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 08:09:10,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1539760.0, ans=0.0 2024-08-12 08:09:24,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1539860.0, ans=0.125 2024-08-12 08:09:28,357 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 42 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 08:09:39,827 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 08:09:50,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1540060.0, ans=0.1 2024-08-12 08:10:09,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.551e+01 2.907e+01 3.420e+01 5.824e+01, threshold=5.813e+01, percent-clipped=1.0 2024-08-12 08:10:11,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1540160.0, ans=0.0 2024-08-12 08:10:19,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9100, loss[loss=0.09092, beats_loss=0.01233, ecapa_loss=0.0001542, whisper_loss=0.07705, over 20324.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001793, whisper_loss=0.09242, over 3856077.44 frames. ], batch size: 81, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:10:27,682 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.802e-01 2024-08-12 08:10:30,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1540260.0, ans=0.09899494936611666 2024-08-12 08:10:39,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1540360.0, ans=0.125 2024-08-12 08:10:49,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1540460.0, ans=0.2 2024-08-12 08:11:00,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1540460.0, ans=0.125 2024-08-12 08:11:08,403 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 20 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-12 08:11:22,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1540660.0, ans=0.1 2024-08-12 08:11:32,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1540760.0, ans=0.125 2024-08-12 08:11:33,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9150, loss[loss=0.08746, beats_loss=0.01326, ecapa_loss=0.0002254, whisper_loss=0.07195, over 20405.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.000179, whisper_loss=0.09176, over 3863976.38 frames. ], batch size: 90, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:11:51,390 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 08:12:05,872 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 08:12:19,264 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.557e+02 2024-08-12 08:12:21,822 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 08:12:29,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-12 08:12:35,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.613e+01 2.813e+01 3.154e+01 4.389e+01, threshold=5.626e+01, percent-clipped=0.0 2024-08-12 08:12:45,038 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 08:12:46,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9200, loss[loss=0.1325, beats_loss=0.008136, ecapa_loss=0.000201, whisper_loss=0.1224, over 20568.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0001784, whisper_loss=0.09245, over 3887634.11 frames. ], batch size: 79, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:12:59,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1541360.0, ans=0.0 2024-08-12 08:13:06,390 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 08:13:22,310 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 08:13:52,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1541660.0, ans=0.125 2024-08-12 08:13:57,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9250, loss[loss=0.11, beats_loss=0.01286, ecapa_loss=0.0001582, whisper_loss=0.09557, over 20516.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01108, ecapa_loss=0.0001789, whisper_loss=0.092, over 3897360.18 frames. ], batch size: 82, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:14:24,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1541860.0, ans=0.125 2024-08-12 08:14:25,733 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 08:14:31,443 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 08:14:37,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1541960.0, ans=0.0 2024-08-12 08:14:37,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=15.0 2024-08-12 08:14:44,640 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-12 08:14:58,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1542160.0, ans=0.07 2024-08-12 08:15:00,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.527e+01 2.843e+01 3.278e+01 5.057e+01, threshold=5.687e+01, percent-clipped=0.0 2024-08-12 08:15:11,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9300, loss[loss=0.1122, beats_loss=0.008616, ecapa_loss=0.0001683, whisper_loss=0.1019, over 16549.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.0001773, whisper_loss=0.09223, over 3896927.32 frames. ], batch size: 58, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:15:31,763 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 19 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 08:15:45,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1542460.0, ans=0.0 2024-08-12 08:15:55,761 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 08:16:01,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1542560.0, ans=0.2 2024-08-12 08:16:11,339 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 08:16:26,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9350, loss[loss=0.09757, beats_loss=0.009864, ecapa_loss=0.0002006, whisper_loss=0.0857, over 21722.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01098, ecapa_loss=0.0001791, whisper_loss=0.09293, over 3909201.63 frames. ], batch size: 91, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:16:37,604 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 08:16:46,738 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-12 08:16:50,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2024-08-12 08:16:52,905 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 08:16:53,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1542860.0, ans=0.2 2024-08-12 08:16:57,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1542960.0, ans=0.125 2024-08-12 08:17:11,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1543060.0, ans=0.125 2024-08-12 08:17:14,012 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 08:17:16,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1543060.0, ans=0.125 2024-08-12 08:17:23,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1543060.0, ans=0.125 2024-08-12 08:17:31,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.553e+01 2.819e+01 3.364e+01 6.243e+01, threshold=5.639e+01, percent-clipped=2.0 2024-08-12 08:17:43,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9400, loss[loss=0.11, beats_loss=0.007737, ecapa_loss=0.0001787, whisper_loss=0.1005, over 20160.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01089, ecapa_loss=0.0001801, whisper_loss=0.09326, over 3899572.17 frames. ], batch size: 76, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:17:46,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1543260.0, ans=0.125 2024-08-12 08:17:56,950 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-12 08:17:58,300 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 08:18:21,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1543460.0, ans=0.1 2024-08-12 08:18:23,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1543460.0, ans=0.125 2024-08-12 08:18:27,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.74 vs. limit=22.5 2024-08-12 08:18:50,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-08-12 08:18:58,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1543760.0, ans=0.125 2024-08-12 08:18:58,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9450, loss[loss=0.1155, beats_loss=0.01176, ecapa_loss=0.000176, whisper_loss=0.102, over 22943.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001789, whisper_loss=0.09256, over 3877345.69 frames. ], batch size: 91, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:18:59,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1543760.0, ans=0.125 2024-08-12 08:19:04,863 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 08:19:09,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1543760.0, ans=0.1 2024-08-12 08:19:14,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1543860.0, ans=0.1 2024-08-12 08:19:21,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1543860.0, ans=0.0 2024-08-12 08:19:24,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-12 08:19:27,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1543960.0, ans=0.125 2024-08-12 08:19:41,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1544060.0, ans=0.125 2024-08-12 08:19:49,280 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-12 08:19:53,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1544060.0, ans=0.125 2024-08-12 08:19:55,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1544060.0, ans=0.1 2024-08-12 08:20:01,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1544160.0, ans=22.5 2024-08-12 08:20:01,418 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.483e+01 2.821e+01 3.317e+01 4.965e+01, threshold=5.642e+01, percent-clipped=0.0 2024-08-12 08:20:11,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9500, loss[loss=0.09418, beats_loss=0.01235, ecapa_loss=0.000185, whisper_loss=0.07998, over 17018.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01098, ecapa_loss=0.0001787, whisper_loss=0.09266, over 3897978.88 frames. ], batch size: 71, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:20:19,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-08-12 08:20:27,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1544360.0, ans=0.1 2024-08-12 08:20:28,413 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 08:20:31,706 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 08:20:31,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1544360.0, ans=0.0 2024-08-12 08:20:52,222 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 08:20:53,460 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 08:20:59,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1544560.0, ans=0.1 2024-08-12 08:21:01,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1544560.0, ans=0.1 2024-08-12 08:21:07,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1544560.0, ans=0.125 2024-08-12 08:21:24,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9550, loss[loss=0.1033, beats_loss=0.01038, ecapa_loss=0.0001828, whisper_loss=0.0911, over 22753.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01097, ecapa_loss=0.0001802, whisper_loss=0.0924, over 3863551.50 frames. ], batch size: 93, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:21:38,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=12.0 2024-08-12 08:22:11,344 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 08:22:26,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.581e+01 2.910e+01 3.415e+01 4.856e+01, threshold=5.819e+01, percent-clipped=0.0 2024-08-12 08:22:36,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9600, loss[loss=0.1212, beats_loss=0.009084, ecapa_loss=0.0001669, whisper_loss=0.1104, over 14683.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001813, whisper_loss=0.09155, over 3853507.45 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:23:10,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1545460.0, ans=0.0 2024-08-12 08:23:20,452 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 08:23:49,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9650, loss[loss=0.105, beats_loss=0.01233, ecapa_loss=0.0001617, whisper_loss=0.09108, over 16862.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01101, ecapa_loss=0.0001803, whisper_loss=0.09165, over 3839556.21 frames. ], batch size: 66, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:23:50,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1545760.0, ans=0.2 2024-08-12 08:24:14,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1545860.0, ans=0.125 2024-08-12 08:24:17,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1545960.0, ans=0.125 2024-08-12 08:24:22,278 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 08:24:25,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1545960.0, ans=0.0 2024-08-12 08:24:41,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.05 vs. limit=22.5 2024-08-12 08:24:50,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.490e+01 2.776e+01 3.280e+01 4.565e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 08:24:50,646 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 08:24:51,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1546160.0, ans=0.1 2024-08-12 08:25:00,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9700, loss[loss=0.1378, beats_loss=0.008228, ecapa_loss=0.0002003, whisper_loss=0.1275, over 20880.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001812, whisper_loss=0.09157, over 3810100.44 frames. ], batch size: 83, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:25:17,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-08-12 08:25:17,883 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 08:25:19,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1546360.0, ans=0.125 2024-08-12 08:25:23,338 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-12 08:25:37,689 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 08:25:43,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1546560.0, ans=0.125 2024-08-12 08:25:45,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1546560.0, ans=0.125 2024-08-12 08:26:09,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-12 08:26:12,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1546760.0, ans=0.125 2024-08-12 08:26:13,103 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9750, loss[loss=0.07475, beats_loss=0.009373, ecapa_loss=0.0001992, whisper_loss=0.06338, over 13249.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01101, ecapa_loss=0.0001806, whisper_loss=0.09098, over 3836760.87 frames. ], batch size: 53, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:26:16,306 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 08:26:36,242 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 08:26:44,673 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 08:27:08,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1547060.0, ans=0.125 2024-08-12 08:27:16,895 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.437e+01 2.801e+01 3.445e+01 6.244e+01, threshold=5.602e+01, percent-clipped=1.0 2024-08-12 08:27:25,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1547160.0, ans=0.125 2024-08-12 08:27:27,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9800, loss[loss=0.09738, beats_loss=0.01086, ecapa_loss=0.0002096, whisper_loss=0.08442, over 18498.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01106, ecapa_loss=0.0001797, whisper_loss=0.09072, over 3843882.01 frames. ], batch size: 83, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:27:34,821 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 08:27:42,907 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:27:43,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1547360.0, ans=0.2 2024-08-12 08:28:17,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1547560.0, ans=0.05 2024-08-12 08:28:20,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2024-08-12 08:28:42,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9850, loss[loss=0.1117, beats_loss=0.01148, ecapa_loss=0.0001623, whisper_loss=0.09861, over 22649.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01111, ecapa_loss=0.0001798, whisper_loss=0.09089, over 3900917.37 frames. ], batch size: 88, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:28:48,859 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:28:58,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1547860.0, ans=0.125 2024-08-12 08:29:11,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-08-12 08:29:15,949 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 08:29:20,490 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 08:29:47,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.546e+01 2.857e+01 3.272e+01 5.247e+01, threshold=5.713e+01, percent-clipped=0.0 2024-08-12 08:29:57,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9900, loss[loss=0.1036, beats_loss=0.01108, ecapa_loss=0.0001715, whisper_loss=0.09085, over 13899.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01119, ecapa_loss=0.000179, whisper_loss=0.09115, over 3896674.93 frames. ], batch size: 56, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:29:59,504 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 08:30:18,717 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 08:30:25,906 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 08:30:30,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1548460.0, ans=0.125 2024-08-12 08:30:43,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548560.0, ans=0.1 2024-08-12 08:30:46,165 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 08:30:46,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1548560.0, ans=0.125 2024-08-12 08:30:53,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1548560.0, ans=10.0 2024-08-12 08:30:54,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2024-08-12 08:30:58,126 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 08:31:05,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1548660.0, ans=0.1 2024-08-12 08:31:07,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1548660.0, ans=0.1 2024-08-12 08:31:10,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 9950, loss[loss=0.1256, beats_loss=0.008588, ecapa_loss=0.000202, whisper_loss=0.115, over 20345.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001811, whisper_loss=0.09188, over 3883911.29 frames. ], batch size: 78, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:31:19,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1548760.0, ans=0.125 2024-08-12 08:31:34,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1548860.0, ans=0.2 2024-08-12 08:31:44,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1548960.0, ans=0.125 2024-08-12 08:31:56,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-12 08:32:13,629 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 08:32:14,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.498e+01 2.780e+01 3.249e+01 5.152e+01, threshold=5.559e+01, percent-clipped=0.0 2024-08-12 08:32:15,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1549160.0, ans=0.1 2024-08-12 08:32:16,680 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 13 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 08:32:24,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10000, loss[loss=0.1321, beats_loss=0.009802, ecapa_loss=0.0001397, whisper_loss=0.1209, over 18201.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0111, ecapa_loss=0.0001808, whisper_loss=0.0924, over 3886026.10 frames. ], batch size: 67, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:32:25,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1549260.0, ans=0.2 2024-08-12 08:32:32,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.59 vs. limit=15.0 2024-08-12 08:32:38,999 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.754e-02 2024-08-12 08:33:16,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1549560.0, ans=0.1 2024-08-12 08:33:16,417 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:33:24,263 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 08:33:38,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10050, loss[loss=0.1064, beats_loss=0.01275, ecapa_loss=0.0001444, whisper_loss=0.09216, over 14218.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01101, ecapa_loss=0.0001804, whisper_loss=0.09205, over 3895066.42 frames. ], batch size: 58, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:33:42,955 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 08:34:00,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1549860.0, ans=0.125 2024-08-12 08:34:40,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.497e+01 2.870e+01 3.338e+01 7.482e+01, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 08:34:44,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1550160.0, ans=0.1 2024-08-12 08:34:51,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10100, loss[loss=0.1042, beats_loss=0.007818, ecapa_loss=0.0002803, whisper_loss=0.09355, over 12962.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01116, ecapa_loss=0.000179, whisper_loss=0.09143, over 3902595.95 frames. ], batch size: 54, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:34:58,284 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 08:35:01,199 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 08:35:20,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1550460.0, ans=0.125 2024-08-12 08:35:20,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1550460.0, ans=0.95 2024-08-12 08:35:29,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1550460.0, ans=0.2 2024-08-12 08:35:31,309 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 08:35:40,708 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 08:36:04,972 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10150, loss[loss=0.09642, beats_loss=0.01386, ecapa_loss=0.0001477, whisper_loss=0.08108, over 22651.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001809, whisper_loss=0.09155, over 3892647.68 frames. ], batch size: 92, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:36:14,489 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:36:30,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2024-08-12 08:36:32,184 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 08:37:09,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1551160.0, ans=0.0 2024-08-12 08:37:10,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.557e+01 2.799e+01 3.287e+01 1.688e+02, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 08:37:17,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1551160.0, ans=0.0 2024-08-12 08:37:20,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1551260.0, ans=0.0 2024-08-12 08:37:21,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10200, loss[loss=0.1009, beats_loss=0.01138, ecapa_loss=0.0001585, whisper_loss=0.08789, over 21310.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001832, whisper_loss=0.09171, over 3895891.18 frames. ], batch size: 82, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:37:24,848 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 08:37:42,073 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 08:37:49,809 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-12 08:37:59,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1551460.0, ans=0.1 2024-08-12 08:38:04,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-12 08:38:13,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-08-12 08:38:17,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1551560.0, ans=0.95 2024-08-12 08:38:31,659 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 08:38:37,565 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 08:38:38,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.46 vs. limit=10.0 2024-08-12 08:38:38,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10250, loss[loss=0.134, beats_loss=0.007845, ecapa_loss=0.0002108, whisper_loss=0.1241, over 23072.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01096, ecapa_loss=0.0001833, whisper_loss=0.09236, over 3921147.20 frames. ], batch size: 91, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:38:45,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1551760.0, ans=0.0 2024-08-12 08:38:59,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1551860.0, ans=0.125 2024-08-12 08:39:05,306 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 08:39:12,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-12 08:39:23,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1551960.0, ans=0.0 2024-08-12 08:39:31,388 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 08:39:31,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1552060.0, ans=0.125 2024-08-12 08:39:46,431 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.422e+01 2.707e+01 3.104e+01 5.382e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 08:39:46,645 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 08:39:47,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-12 08:39:49,967 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 08:39:52,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-12 08:39:57,300 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10300, loss[loss=0.09496, beats_loss=0.0099, ecapa_loss=0.000182, whisper_loss=0.08324, over 13730.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001814, whisper_loss=0.0925, over 3938641.92 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:39:58,947 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 08:39:59,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1552260.0, ans=0.125 2024-08-12 08:40:07,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1552260.0, ans=0.0 2024-08-12 08:40:10,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1552260.0, ans=0.09899494936611666 2024-08-12 08:40:16,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-12 08:40:17,292 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 08:40:53,657 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 08:41:13,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-08-12 08:41:13,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10350, loss[loss=0.08664, beats_loss=0.009985, ecapa_loss=0.0002269, whisper_loss=0.07439, over 17197.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01099, ecapa_loss=0.0001815, whisper_loss=0.09256, over 3935155.13 frames. ], batch size: 70, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:41:21,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1552760.0, ans=0.0 2024-08-12 08:41:25,789 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 08:41:29,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2024-08-12 08:41:33,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1552860.0, ans=0.07 2024-08-12 08:41:34,714 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 08:41:53,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-08-12 08:42:17,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.587e+01 2.793e+01 3.199e+01 6.798e+01, threshold=5.587e+01, percent-clipped=1.0 2024-08-12 08:42:17,453 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 08:42:20,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1553160.0, ans=0.125 2024-08-12 08:42:27,518 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10400, loss[loss=0.09977, beats_loss=0.007717, ecapa_loss=0.0002304, whisper_loss=0.08975, over 16825.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01098, ecapa_loss=0.0001805, whisper_loss=0.09227, over 3870681.94 frames. ], batch size: 70, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:42:34,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1553260.0, ans=0.0 2024-08-12 08:42:37,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1553260.0, ans=0.125 2024-08-12 08:43:08,950 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-12 08:43:26,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1553660.0, ans=0.1 2024-08-12 08:43:32,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1553660.0, ans=0.0 2024-08-12 08:43:43,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10450, loss[loss=0.08268, beats_loss=0.0146, ecapa_loss=0.000136, whisper_loss=0.06673, over 19286.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.0001794, whisper_loss=0.09235, over 3885943.14 frames. ], batch size: 77, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:43:48,331 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 08:44:20,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1553960.0, ans=0.0 2024-08-12 08:44:25,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1553960.0, ans=0.0 2024-08-12 08:44:31,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2024-08-12 08:44:37,128 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 08:44:40,075 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 08:44:41,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1554060.0, ans=0.125 2024-08-12 08:44:48,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.496e+01 2.841e+01 3.416e+01 4.859e+01, threshold=5.681e+01, percent-clipped=0.0 2024-08-12 08:44:49,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1554160.0, ans=0.125 2024-08-12 08:44:52,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1554160.0, ans=0.125 2024-08-12 08:44:59,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10500, loss[loss=0.1153, beats_loss=0.01068, ecapa_loss=0.0001458, whisper_loss=0.1032, over 16221.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01092, ecapa_loss=0.0001806, whisper_loss=0.0929, over 3879017.85 frames. ], batch size: 63, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:45:06,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1554260.0, ans=0.0 2024-08-12 08:45:26,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1554360.0, ans=0.2 2024-08-12 08:45:27,381 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 17 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 08:45:32,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-08-12 08:45:32,636 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 08:45:35,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2024-08-12 08:45:35,898 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-12 08:45:47,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1554560.0, ans=6.0 2024-08-12 08:46:12,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1554760.0, ans=0.2 2024-08-12 08:46:12,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10550, loss[loss=0.09765, beats_loss=0.01231, ecapa_loss=0.0001536, whisper_loss=0.08381, over 17702.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01107, ecapa_loss=0.0001813, whisper_loss=0.09167, over 3860766.71 frames. ], batch size: 69, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:46:15,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1554760.0, ans=0.5 2024-08-12 08:46:22,698 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-12 08:46:46,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1554960.0, ans=0.0 2024-08-12 08:46:46,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-12 08:46:58,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1555060.0, ans=0.1 2024-08-12 08:46:59,332 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 08:47:07,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1555060.0, ans=0.1 2024-08-12 08:47:10,325 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.983e+01 2024-08-12 08:47:13,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1555160.0, ans=0.0 2024-08-12 08:47:18,761 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.542e+01 2.754e+01 3.046e+01 4.371e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-12 08:47:29,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10600, loss[loss=0.1061, beats_loss=0.009326, ecapa_loss=0.0001869, whisper_loss=0.09495, over 21647.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001808, whisper_loss=0.0922, over 3920759.36 frames. ], batch size: 83, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:47:31,558 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 08:47:41,730 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 23 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-12 08:47:43,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1555360.0, ans=0.125 2024-08-12 08:47:46,364 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 08:47:51,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-12 08:47:52,507 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 08:47:53,828 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 08:48:14,639 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 08:48:16,060 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 08:48:26,000 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 08:48:30,165 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 08:48:31,502 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 41 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 08:48:43,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10650, loss[loss=0.1232, beats_loss=0.008023, ecapa_loss=0.0001907, whisper_loss=0.1133, over 24004.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001798, whisper_loss=0.09262, over 3910603.28 frames. ], batch size: 91, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:49:06,911 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 08:49:07,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1555860.0, ans=0.125 2024-08-12 08:49:14,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1555960.0, ans=0.0 2024-08-12 08:49:30,468 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 08:49:37,497 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 08:49:47,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.652e+01 2.957e+01 3.394e+01 5.576e+01, threshold=5.914e+01, percent-clipped=1.0 2024-08-12 08:49:58,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10700, loss[loss=0.1188, beats_loss=0.009024, ecapa_loss=0.000191, whisper_loss=0.1079, over 21949.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001791, whisper_loss=0.09259, over 3907065.10 frames. ], batch size: 87, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:50:02,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1556260.0, ans=0.1 2024-08-12 08:50:25,858 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 08:50:37,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1556460.0, ans=0.0 2024-08-12 08:50:52,222 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 08:51:06,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=6.0 2024-08-12 08:51:12,897 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10750, loss[loss=0.09456, beats_loss=0.008675, ecapa_loss=0.0002108, whisper_loss=0.08378, over 15695.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01102, ecapa_loss=0.0001795, whisper_loss=0.09321, over 3917228.60 frames. ], batch size: 64, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:51:14,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1556760.0, ans=0.0 2024-08-12 08:51:29,391 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 08:51:38,903 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-12 08:51:39,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2024-08-12 08:51:49,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1556960.0, ans=0.0 2024-08-12 08:51:49,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1556960.0, ans=0.07 2024-08-12 08:51:59,008 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 08:52:03,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1557060.0, ans=0.02 2024-08-12 08:52:04,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1557060.0, ans=0.125 2024-08-12 08:52:04,858 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 13 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 08:52:07,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-12 08:52:10,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2024-08-12 08:52:17,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.513e+01 2.826e+01 3.158e+01 5.993e+01, threshold=5.652e+01, percent-clipped=1.0 2024-08-12 08:52:28,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10800, loss[loss=0.1002, beats_loss=0.01217, ecapa_loss=0.000171, whisper_loss=0.08631, over 17532.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01103, ecapa_loss=0.0001794, whisper_loss=0.09353, over 3904024.42 frames. ], batch size: 69, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:52:38,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1557260.0, ans=0.125 2024-08-12 08:52:43,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1557360.0, ans=0.125 2024-08-12 08:52:43,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1557360.0, ans=0.1 2024-08-12 08:52:49,467 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:53:04,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1557460.0, ans=0.125 2024-08-12 08:53:15,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1557560.0, ans=0.1 2024-08-12 08:53:42,611 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10850, loss[loss=0.1212, beats_loss=0.01113, ecapa_loss=0.0001809, whisper_loss=0.1082, over 22890.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01104, ecapa_loss=0.0001789, whisper_loss=0.09346, over 3934116.95 frames. ], batch size: 91, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:54:00,644 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 08:54:06,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1557860.0, ans=0.0 2024-08-12 08:54:39,459 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.258e-02 2024-08-12 08:54:45,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1558160.0, ans=0.1 2024-08-12 08:54:47,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.594e+01 2.957e+01 3.345e+01 7.139e+01, threshold=5.915e+01, percent-clipped=2.0 2024-08-12 08:54:58,057 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 08:54:58,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1558260.0, ans=0.1 2024-08-12 08:54:59,325 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10900, loss[loss=0.1057, beats_loss=0.01092, ecapa_loss=0.0002359, whisper_loss=0.09241, over 19391.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01106, ecapa_loss=0.0001794, whisper_loss=0.09323, over 3971274.47 frames. ], batch size: 84, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:55:17,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1558360.0, ans=0.1 2024-08-12 08:55:20,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1558360.0, ans=0.125 2024-08-12 08:55:27,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1558360.0, ans=0.125 2024-08-12 08:55:36,084 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 08:55:48,696 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 08:55:58,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1558560.0, ans=0.0 2024-08-12 08:56:02,738 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-12 08:56:17,807 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 08:56:18,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 10950, loss[loss=0.09898, beats_loss=0.01167, ecapa_loss=0.0001908, whisper_loss=0.0854, over 21848.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01099, ecapa_loss=0.0001797, whisper_loss=0.0933, over 3958911.92 frames. ], batch size: 93, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:56:29,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1558760.0, ans=0.2 2024-08-12 08:56:37,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1558860.0, ans=0.035 2024-08-12 08:56:39,113 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-12 08:56:42,747 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-12 08:56:52,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1558860.0, ans=0.125 2024-08-12 08:56:53,969 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 08:57:12,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1559060.0, ans=0.0 2024-08-12 08:57:17,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1559060.0, ans=0.125 2024-08-12 08:57:25,543 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 08:57:36,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.556e+01 2.763e+01 3.156e+01 4.815e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 08:57:45,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1559160.0, ans=0.2 2024-08-12 08:57:49,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1559260.0, ans=0.125 2024-08-12 08:57:50,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11000, loss[loss=0.1159, beats_loss=0.01108, ecapa_loss=0.0001637, whisper_loss=0.1032, over 22150.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01089, ecapa_loss=0.0001811, whisper_loss=0.09384, over 3980229.81 frames. ], batch size: 87, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:58:27,250 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 08:59:13,499 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11050, loss[loss=0.1072, beats_loss=0.009051, ecapa_loss=0.0001835, whisper_loss=0.09629, over 14251.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01084, ecapa_loss=0.0001822, whisper_loss=0.09361, over 3957157.16 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:59:20,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1559760.0, ans=0.1 2024-08-12 08:59:21,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1559760.0, ans=0.125 2024-08-12 08:59:31,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1559760.0, ans=0.125 2024-08-12 08:59:50,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1559860.0, ans=0.0 2024-08-12 09:00:02,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-12 09:00:18,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1559960.0, ans=0.125 2024-08-12 09:00:22,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1560060.0, ans=0.0 2024-08-12 09:00:30,525 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 28 from Vox, 19 fro AS 2024-08-12 09:00:31,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2024-08-12 09:00:34,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1560060.0, ans=0.05 2024-08-12 09:00:46,347 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 09:00:49,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.392e+01 2.745e+01 3.211e+01 4.714e+01, threshold=5.490e+01, percent-clipped=0.0 2024-08-12 09:00:50,433 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-12 09:01:04,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11100, loss[loss=0.1293, beats_loss=0.01032, ecapa_loss=0.0001223, whisper_loss=0.1177, over 25433.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01084, ecapa_loss=0.0001823, whisper_loss=0.09342, over 3973341.51 frames. ], batch size: 91, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:01:13,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1560260.0, ans=0.0 2024-08-12 09:01:15,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1560260.0, ans=10.0 2024-08-12 09:01:22,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1560260.0, ans=0.125 2024-08-12 09:01:23,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-12 09:01:42,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1560360.0, ans=0.125 2024-08-12 09:02:26,418 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 09:02:28,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1560560.0, ans=0.0 2024-08-12 09:02:53,368 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 09:02:59,260 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11150, loss[loss=0.09691, beats_loss=0.01026, ecapa_loss=0.0001227, whisper_loss=0.08542, over 16555.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01083, ecapa_loss=0.0001812, whisper_loss=0.09366, over 3966202.67 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:03:14,256 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 09:03:57,243 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 09:04:23,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=22.5 2024-08-12 09:04:25,592 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 09:04:28,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1561160.0, ans=0.0 2024-08-12 09:04:29,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.591e+01 2.914e+01 3.431e+01 1.120e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 09:04:40,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11200, loss[loss=0.1052, beats_loss=0.011, ecapa_loss=0.0001928, whisper_loss=0.09231, over 19449.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01082, ecapa_loss=0.000183, whisper_loss=0.09356, over 3933197.84 frames. ], batch size: 79, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:04:45,311 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 09:04:53,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2024-08-12 09:04:59,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=12.0 2024-08-12 09:05:00,069 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 09:05:01,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1561360.0, ans=0.125 2024-08-12 09:05:18,408 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 09:05:33,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1561560.0, ans=0.1 2024-08-12 09:05:55,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11250, loss[loss=0.1222, beats_loss=0.008508, ecapa_loss=0.0001727, whisper_loss=0.1119, over 22032.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01077, ecapa_loss=0.0001836, whisper_loss=0.09348, over 3933442.72 frames. ], batch size: 85, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:06:05,488 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 34 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 09:06:11,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1561860.0, ans=0.2 2024-08-12 09:06:14,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1561860.0, ans=0.0 2024-08-12 09:06:17,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-12 09:06:28,047 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 09:06:32,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1561960.0, ans=0.125 2024-08-12 09:06:40,256 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 09:06:41,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1562060.0, ans=0.1 2024-08-12 09:07:00,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-12 09:07:00,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.463e+01 2.812e+01 3.090e+01 4.861e+01, threshold=5.624e+01, percent-clipped=0.0 2024-08-12 09:07:12,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11300, loss[loss=0.1059, beats_loss=0.01119, ecapa_loss=0.0001627, whisper_loss=0.09307, over 18146.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01071, ecapa_loss=0.0001829, whisper_loss=0.09351, over 3897994.51 frames. ], batch size: 70, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:07:12,499 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 09:07:14,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1562260.0, ans=0.04949747468305833 2024-08-12 09:07:31,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1562360.0, ans=0.125 2024-08-12 09:07:43,663 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 41 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 09:08:12,353 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 09:08:27,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11350, loss[loss=0.09874, beats_loss=0.01129, ecapa_loss=0.000168, whisper_loss=0.08577, over 21265.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01073, ecapa_loss=0.0001819, whisper_loss=0.09389, over 3911863.97 frames. ], batch size: 86, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:08:52,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1562860.0, ans=0.125 2024-08-12 09:09:16,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1563060.0, ans=0.125 2024-08-12 09:09:32,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.664e+01 2.943e+01 3.527e+01 6.465e+01, threshold=5.886e+01, percent-clipped=3.0 2024-08-12 09:09:37,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1563160.0, ans=0.125 2024-08-12 09:09:40,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1563160.0, ans=0.2 2024-08-12 09:09:43,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11400, loss[loss=0.08265, beats_loss=0.01447, ecapa_loss=0.0001768, whisper_loss=0.06642, over 19693.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01076, ecapa_loss=0.0001815, whisper_loss=0.09374, over 3921770.71 frames. ], batch size: 84, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:09:44,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.74 vs. limit=22.5 2024-08-12 09:09:49,821 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 09:09:52,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1563260.0, ans=0.125 2024-08-12 09:09:54,217 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 09:10:15,608 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 09:10:19,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1563460.0, ans=0.07 2024-08-12 09:10:38,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1563560.0, ans=0.125 2024-08-12 09:10:39,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1563560.0, ans=10.0 2024-08-12 09:10:58,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11450, loss[loss=0.1159, beats_loss=0.0105, ecapa_loss=0.0001953, whisper_loss=0.1034, over 22487.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01078, ecapa_loss=0.0001821, whisper_loss=0.09356, over 3889742.21 frames. ], batch size: 87, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:11:04,551 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 09:11:09,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1563760.0, ans=0.125 2024-08-12 09:11:31,068 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.964e-01 2024-08-12 09:11:43,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-12 09:11:57,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2024-08-12 09:12:01,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.691e+01 2.984e+01 3.648e+01 5.377e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 09:12:12,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11500, loss[loss=0.08435, beats_loss=0.01271, ecapa_loss=0.0002158, whisper_loss=0.06948, over 21225.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01086, ecapa_loss=0.00018, whisper_loss=0.09298, over 3904718.56 frames. ], batch size: 90, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:12:19,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1564260.0, ans=0.125 2024-08-12 09:12:41,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1564460.0, ans=0.0 2024-08-12 09:12:47,348 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 09:12:47,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1564460.0, ans=0.125 2024-08-12 09:12:48,714 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 09:13:01,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-08-12 09:13:02,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1564560.0, ans=0.2 2024-08-12 09:13:07,984 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 09:13:11,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1564660.0, ans=0.125 2024-08-12 09:13:26,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11550, loss[loss=0.1073, beats_loss=0.01137, ecapa_loss=0.000199, whisper_loss=0.09394, over 22019.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01084, ecapa_loss=0.0001801, whisper_loss=0.09315, over 3890819.35 frames. ], batch size: 92, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:13:32,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1564760.0, ans=0.05 2024-08-12 09:13:37,017 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:13:39,686 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 09:13:51,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1564860.0, ans=0.0 2024-08-12 09:14:10,128 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 09:14:32,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.783e+01 3.251e+01 6.274e+01, threshold=5.566e+01, percent-clipped=2.0 2024-08-12 09:14:34,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1565160.0, ans=0.125 2024-08-12 09:14:41,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11600, loss[loss=0.1331, beats_loss=0.006989, ecapa_loss=0.000229, whisper_loss=0.1238, over 16321.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01089, ecapa_loss=0.0001786, whisper_loss=0.09311, over 3904575.74 frames. ], batch size: 63, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:14:56,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1565360.0, ans=0.0 2024-08-12 09:14:59,592 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 09:15:01,205 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 09:15:06,601 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 09:15:35,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1565560.0, ans=0.1 2024-08-12 09:15:40,444 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-12 09:15:42,050 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 09:15:46,344 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 19 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-12 09:15:52,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11650, loss[loss=0.07702, beats_loss=0.01241, ecapa_loss=0.0001974, whisper_loss=0.06263, over 19095.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01096, ecapa_loss=0.0001791, whisper_loss=0.09226, over 3925702.32 frames. ], batch size: 81, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:16:11,408 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 09:16:29,880 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 09:16:32,556 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 09:16:53,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.448e+01 2.832e+01 3.122e+01 7.544e+01, threshold=5.665e+01, percent-clipped=2.0 2024-08-12 09:17:03,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11700, loss[loss=0.1119, beats_loss=0.01287, ecapa_loss=0.0001791, whisper_loss=0.09727, over 22483.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01099, ecapa_loss=0.0001794, whisper_loss=0.09274, over 3936528.39 frames. ], batch size: 91, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:17:17,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1566360.0, ans=0.125 2024-08-12 09:17:50,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1566560.0, ans=0.2 2024-08-12 09:17:51,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1566560.0, ans=0.125 2024-08-12 09:17:57,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1566660.0, ans=0.2 2024-08-12 09:18:00,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.32 vs. limit=10.0 2024-08-12 09:18:03,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.63 vs. limit=15.0 2024-08-12 09:18:04,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1566660.0, ans=0.2 2024-08-12 09:18:08,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=12.0 2024-08-12 09:18:11,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11750, loss[loss=0.1235, beats_loss=0.009513, ecapa_loss=0.0001932, whisper_loss=0.112, over 15238.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01107, ecapa_loss=0.0001799, whisper_loss=0.09259, over 3932638.67 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:18:16,603 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 09:18:21,948 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-12 09:18:25,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1566860.0, ans=0.0 2024-08-12 09:18:28,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2024-08-12 09:18:42,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1566960.0, ans=0.125 2024-08-12 09:18:43,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1566960.0, ans=0.125 2024-08-12 09:18:43,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1566960.0, ans=0.2 2024-08-12 09:18:48,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2024-08-12 09:18:51,621 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 35 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 09:18:51,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1566960.0, ans=0.125 2024-08-12 09:18:51,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1566960.0, ans=0.125 2024-08-12 09:18:57,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1567060.0, ans=0.0 2024-08-12 09:19:12,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.551e+01 2.829e+01 3.227e+01 5.711e+01, threshold=5.658e+01, percent-clipped=1.0 2024-08-12 09:19:16,108 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.285e-01 2024-08-12 09:19:22,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11800, loss[loss=0.104, beats_loss=0.0116, ecapa_loss=0.0002043, whisper_loss=0.09037, over 20291.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01108, ecapa_loss=0.0001782, whisper_loss=0.09291, over 3907500.90 frames. ], batch size: 81, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:19:22,937 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 09:19:30,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1567260.0, ans=0.125 2024-08-12 09:19:31,097 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 09:19:33,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-12 09:19:42,406 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:19:43,464 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 09:20:07,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1567560.0, ans=0.125 2024-08-12 09:20:15,517 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 09:20:20,821 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 28 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-12 09:20:31,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11850, loss[loss=0.1149, beats_loss=0.008923, ecapa_loss=0.0002464, whisper_loss=0.1035, over 20653.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01115, ecapa_loss=0.0001777, whisper_loss=0.09236, over 3955189.23 frames. ], batch size: 89, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:20:39,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.44 vs. limit=15.0 2024-08-12 09:20:49,928 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 09:20:56,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-12 09:20:57,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1567860.0, ans=0.125 2024-08-12 09:21:13,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1568060.0, ans=0.125 2024-08-12 09:21:17,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2024-08-12 09:21:31,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.485e+01 2.770e+01 3.068e+01 4.213e+01, threshold=5.539e+01, percent-clipped=0.0 2024-08-12 09:21:39,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11900, loss[loss=0.1017, beats_loss=0.01184, ecapa_loss=0.0001328, whisper_loss=0.08856, over 14763.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001775, whisper_loss=0.09282, over 3982708.69 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:21:48,118 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 09:21:48,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1568260.0, ans=0.125 2024-08-12 09:21:51,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1568260.0, ans=0.0 2024-08-12 09:22:02,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1568360.0, ans=0.125 2024-08-12 09:22:26,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1568560.0, ans=0.1 2024-08-12 09:22:27,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2024-08-12 09:22:28,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1568560.0, ans=0.125 2024-08-12 09:22:29,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1568560.0, ans=0.125 2024-08-12 09:22:30,793 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 09:22:49,892 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 11950, loss[loss=0.1132, beats_loss=0.008344, ecapa_loss=0.0002063, whisper_loss=0.1028, over 17946.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01108, ecapa_loss=0.0001785, whisper_loss=0.09266, over 3950772.11 frames. ], batch size: 69, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:22:56,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1568760.0, ans=0.125 2024-08-12 09:22:57,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1568760.0, ans=0.125 2024-08-12 09:23:03,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1568860.0, ans=0.0 2024-08-12 09:23:15,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1568860.0, ans=0.125 2024-08-12 09:23:27,933 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-12 09:23:42,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1569060.0, ans=0.0 2024-08-12 09:23:51,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.558e+01 2.859e+01 3.291e+01 5.466e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 09:23:59,009 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 09:24:00,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12000, loss[loss=0.09892, beats_loss=0.012, ecapa_loss=0.000134, whisper_loss=0.08558, over 22832.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001776, whisper_loss=0.0922, over 3915840.39 frames. ], batch size: 89, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:24:00,137 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 09:24:39,975 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0006057, whisper_loss=0.2491, over 922467.00 frames. 2024-08-12 09:24:56,741 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on SV_voxceleb1: loss=0.004842, beats_loss=0, ecapa_loss=0.0004842, whisper_loss=0, over 939242.00 frames. 2024-08-12 09:26:06,320 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.8748, 1.8367, 1.7850, 1.8087, 2.5100, 1.7603, 1.8540, 1.6901], device='cuda:3') 2024-08-12 09:26:18,607 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.5613, 1.1896, 1.2998, 1.2592, 1.6062, 1.1571, 1.2964, 1.2097], device='cuda:3') 2024-08-12 09:26:51,050 INFO [train_multi_KD3.py:1149] (3/4) Epoch 11, validation on AT_audioset: loss=0.02454, beats_loss=0.02454, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 09:26:51,056 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 09:26:51,275 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 09:26:57,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1569260.0, ans=0.125 2024-08-12 09:27:01,905 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 09:27:03,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1569260.0, ans=0.0 2024-08-12 09:27:06,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1569360.0, ans=0.125 2024-08-12 09:27:08,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1569360.0, ans=0.125 2024-08-12 09:27:11,236 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 09:27:17,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1569360.0, ans=0.0 2024-08-12 09:27:19,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1569460.0, ans=0.2 2024-08-12 09:27:21,037 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 09:27:23,951 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 09:27:26,889 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 09:27:34,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.69 vs. limit=10.0 2024-08-12 09:27:48,233 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 09:27:52,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2024-08-12 09:27:55,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1569660.0, ans=0.0 2024-08-12 09:27:57,835 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 09:27:59,249 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 09:28:01,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12050, loss[loss=0.09707, beats_loss=0.01231, ecapa_loss=0.0001841, whisper_loss=0.08292, over 19414.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.0001789, whisper_loss=0.09205, over 3916644.87 frames. ], batch size: 78, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:28:11,804 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-12 09:28:12,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1569760.0, ans=0.2 2024-08-12 09:28:16,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-08-12 09:28:39,526 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 09:28:46,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1570060.0, ans=0.0 2024-08-12 09:28:57,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1570160.0, ans=0.2 2024-08-12 09:29:03,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.535e+01 2.943e+01 3.446e+01 4.689e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 09:29:04,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1570160.0, ans=0.125 2024-08-12 09:29:06,915 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 09:29:12,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12100, loss[loss=0.1087, beats_loss=0.01127, ecapa_loss=0.0001662, whisper_loss=0.09575, over 14896.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01106, ecapa_loss=0.0001806, whisper_loss=0.09219, over 3914154.44 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:29:16,514 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-12 09:29:37,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1570360.0, ans=0.07 2024-08-12 09:29:40,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1570460.0, ans=0.125 2024-08-12 09:29:57,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-08-12 09:30:22,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12150, loss[loss=0.09812, beats_loss=0.01036, ecapa_loss=0.0002109, whisper_loss=0.08565, over 13291.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001809, whisper_loss=0.09193, over 3892346.40 frames. ], batch size: 54, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:30:35,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1570760.0, ans=0.0 2024-08-12 09:30:55,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1570960.0, ans=0.0 2024-08-12 09:31:03,397 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 09:31:18,187 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 09:31:25,761 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.520e+01 2.822e+01 3.048e+01 5.048e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 09:31:28,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1571160.0, ans=0.1 2024-08-12 09:31:31,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571160.0, ans=0.1 2024-08-12 09:31:34,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12200, loss[loss=0.09768, beats_loss=0.01183, ecapa_loss=0.0001575, whisper_loss=0.08427, over 16348.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01109, ecapa_loss=0.0001796, whisper_loss=0.09152, over 3888844.87 frames. ], batch size: 66, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:31:42,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1571260.0, ans=0.035 2024-08-12 09:32:00,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-08-12 09:32:03,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1571460.0, ans=0.0 2024-08-12 09:32:09,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1571460.0, ans=0.0 2024-08-12 09:32:28,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.51 vs. limit=10.0 2024-08-12 09:32:29,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-12 09:32:47,429 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12250, loss[loss=0.1093, beats_loss=0.009229, ecapa_loss=0.0001502, whisper_loss=0.09859, over 19278.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01101, ecapa_loss=0.0001794, whisper_loss=0.09207, over 3876404.99 frames. ], batch size: 71, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:32:47,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1571760.0, ans=10.0 2024-08-12 09:32:48,998 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 09:32:53,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1571760.0, ans=0.1 2024-08-12 09:32:57,827 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 09:33:06,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1571860.0, ans=0.125 2024-08-12 09:33:24,416 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 09:33:24,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1571960.0, ans=0.0 2024-08-12 09:33:34,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1572060.0, ans=0.125 2024-08-12 09:33:44,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1572060.0, ans=0.0 2024-08-12 09:33:48,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1572160.0, ans=0.1 2024-08-12 09:33:51,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1572160.0, ans=0.125 2024-08-12 09:33:51,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.601e+01 2.927e+01 3.328e+01 4.694e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 09:33:52,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1572160.0, ans=0.125 2024-08-12 09:33:55,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1572160.0, ans=0.1 2024-08-12 09:34:00,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12300, loss[loss=0.09099, beats_loss=0.01229, ecapa_loss=0.0001883, whisper_loss=0.07682, over 15558.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001789, whisper_loss=0.09231, over 3881859.14 frames. ], batch size: 63, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:34:02,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1572260.0, ans=22.5 2024-08-12 09:34:25,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-08-12 09:34:40,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1572460.0, ans=0.05 2024-08-12 09:34:49,308 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-12 09:35:12,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12350, loss[loss=0.1122, beats_loss=0.01006, ecapa_loss=0.000209, whisper_loss=0.1, over 15220.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01089, ecapa_loss=0.0001805, whisper_loss=0.09317, over 3891797.47 frames. ], batch size: 62, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:35:26,741 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 09:35:27,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1572860.0, ans=0.0 2024-08-12 09:35:37,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-12 09:35:53,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1572960.0, ans=0.125 2024-08-12 09:35:56,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1573060.0, ans=0.2 2024-08-12 09:36:02,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1573060.0, ans=0.0 2024-08-12 09:36:09,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1573160.0, ans=0.0 2024-08-12 09:36:11,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1573160.0, ans=0.2 2024-08-12 09:36:16,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.617e+01 3.064e+01 3.584e+01 5.581e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 09:36:25,398 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12400, loss[loss=0.1222, beats_loss=0.00946, ecapa_loss=0.0001849, whisper_loss=0.1108, over 21802.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01092, ecapa_loss=0.0001792, whisper_loss=0.09292, over 3924166.45 frames. ], batch size: 87, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:36:30,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1573260.0, ans=0.0 2024-08-12 09:36:36,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2024-08-12 09:36:44,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1573360.0, ans=0.1 2024-08-12 09:36:45,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1573360.0, ans=0.125 2024-08-12 09:36:49,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-08-12 09:36:52,907 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 09:36:57,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1573460.0, ans=0.125 2024-08-12 09:37:06,955 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 28 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-12 09:37:07,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1573460.0, ans=0.125 2024-08-12 09:37:13,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1573560.0, ans=0.125 2024-08-12 09:37:18,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1573560.0, ans=0.1 2024-08-12 09:37:29,680 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 09:37:38,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12450, loss[loss=0.1054, beats_loss=0.009194, ecapa_loss=0.0002262, whisper_loss=0.09396, over 20245.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01093, ecapa_loss=0.0001792, whisper_loss=0.09292, over 3924230.65 frames. ], batch size: 88, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:38:12,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2024-08-12 09:38:16,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1573960.0, ans=0.09899494936611666 2024-08-12 09:38:16,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2024-08-12 09:38:19,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1573960.0, ans=0.125 2024-08-12 09:38:23,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1574060.0, ans=0.0 2024-08-12 09:38:40,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.466e+01 2.753e+01 3.048e+01 4.353e+01, threshold=5.506e+01, percent-clipped=0.0 2024-08-12 09:38:49,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12500, loss[loss=0.1174, beats_loss=0.01373, ecapa_loss=0.0001395, whisper_loss=0.1023, over 13961.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01093, ecapa_loss=0.0001795, whisper_loss=0.09354, over 3939266.23 frames. ], batch size: 55, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:38:49,420 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 09:38:52,392 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 09:38:57,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1574260.0, ans=0.125 2024-08-12 09:39:05,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1574360.0, ans=0.125 2024-08-12 09:39:07,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1574360.0, ans=0.125 2024-08-12 09:39:20,796 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 09:39:21,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-08-12 09:39:22,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1574460.0, ans=0.0 2024-08-12 09:39:28,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1574460.0, ans=0.125 2024-08-12 09:39:33,283 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 09:39:37,265 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 16 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 09:39:46,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1574660.0, ans=0.125 2024-08-12 09:39:59,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12550, loss[loss=0.1078, beats_loss=0.009112, ecapa_loss=0.0002349, whisper_loss=0.09636, over 18037.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01099, ecapa_loss=0.0001792, whisper_loss=0.09313, over 3924026.99 frames. ], batch size: 74, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:40:08,149 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 09:40:17,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2024-08-12 09:40:19,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-08-12 09:40:28,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1574960.0, ans=0.0 2024-08-12 09:40:43,100 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 09:40:52,008 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 09:41:01,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.516e+01 2.754e+01 3.207e+01 3.892e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 09:41:10,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12600, loss[loss=0.1109, beats_loss=0.01175, ecapa_loss=0.0001622, whisper_loss=0.09753, over 22213.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001796, whisper_loss=0.09293, over 3906063.89 frames. ], batch size: 89, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:41:17,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1575260.0, ans=0.125 2024-08-12 09:41:24,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1575360.0, ans=0.07 2024-08-12 09:41:41,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1575460.0, ans=0.0 2024-08-12 09:41:55,141 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 09:41:56,460 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 09:41:57,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1575560.0, ans=0.0 2024-08-12 09:42:01,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1575560.0, ans=0.125 2024-08-12 09:42:02,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-12 09:42:07,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1575660.0, ans=0.1 2024-08-12 09:42:13,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1575660.0, ans=0.125 2024-08-12 09:42:20,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12650, loss[loss=0.1138, beats_loss=0.01031, ecapa_loss=0.0001626, whisper_loss=0.1018, over 18014.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001779, whisper_loss=0.09257, over 3906143.85 frames. ], batch size: 70, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:42:50,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1575960.0, ans=0.0 2024-08-12 09:43:12,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1576060.0, ans=0.1 2024-08-12 09:43:20,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1576160.0, ans=0.2 2024-08-12 09:43:22,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.519e+01 2.747e+01 3.019e+01 4.514e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 09:43:27,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1576160.0, ans=0.1 2024-08-12 09:43:30,737 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12700, loss[loss=0.1122, beats_loss=0.01192, ecapa_loss=0.0002042, whisper_loss=0.09823, over 21917.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01111, ecapa_loss=0.0001785, whisper_loss=0.09279, over 3865285.39 frames. ], batch size: 92, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:43:31,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1576260.0, ans=0.0 2024-08-12 09:43:32,291 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 09:43:32,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1576260.0, ans=0.125 2024-08-12 09:43:36,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-12 09:43:55,405 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.041e-02 2024-08-12 09:43:56,538 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 09:43:58,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1576460.0, ans=0.125 2024-08-12 09:44:02,471 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 09:44:03,828 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 09:44:39,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-12 09:44:41,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12750, loss[loss=0.09546, beats_loss=0.009996, ecapa_loss=0.0002334, whisper_loss=0.08313, over 20091.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001802, whisper_loss=0.09258, over 3896699.91 frames. ], batch size: 87, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:45:01,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1576860.0, ans=0.0 2024-08-12 09:45:05,718 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 09:45:10,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1576960.0, ans=0.1 2024-08-12 09:45:19,626 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-12 09:45:42,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1577160.0, ans=0.0 2024-08-12 09:45:43,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.827e+01 3.190e+01 5.112e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 09:45:51,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1577260.0, ans=0.1 2024-08-12 09:45:51,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12800, loss[loss=0.1042, beats_loss=0.01123, ecapa_loss=0.0001804, whisper_loss=0.09114, over 17734.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01115, ecapa_loss=0.0001802, whisper_loss=0.09246, over 3866887.74 frames. ], batch size: 68, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:46:10,148 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 09:46:12,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1577360.0, ans=0.125 2024-08-12 09:46:17,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2024-08-12 09:46:23,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2024-08-12 09:46:32,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1577460.0, ans=0.125 2024-08-12 09:46:38,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1577560.0, ans=0.1 2024-08-12 09:46:47,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1577660.0, ans=0.07 2024-08-12 09:46:55,956 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 09:47:02,426 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12850, loss[loss=0.09399, beats_loss=0.01305, ecapa_loss=0.0001591, whisper_loss=0.07935, over 13323.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01122, ecapa_loss=0.00018, whisper_loss=0.09184, over 3861839.24 frames. ], batch size: 53, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:47:05,326 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 09:47:10,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1577760.0, ans=15.0 2024-08-12 09:47:21,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1577860.0, ans=0.0 2024-08-12 09:47:24,150 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 09:47:34,982 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 09:47:42,225 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 09:47:42,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1577960.0, ans=0.125 2024-08-12 09:47:50,406 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 37 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 09:48:04,282 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.513e+01 2.797e+01 3.147e+01 4.860e+01, threshold=5.595e+01, percent-clipped=0.0 2024-08-12 09:48:04,514 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 09:48:11,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1578260.0, ans=0.125 2024-08-12 09:48:12,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12900, loss[loss=0.1091, beats_loss=0.00977, ecapa_loss=0.0002158, whisper_loss=0.09714, over 19568.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01119, ecapa_loss=0.0001798, whisper_loss=0.09179, over 3872027.96 frames. ], batch size: 80, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:48:28,114 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 09:49:01,767 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.589e-02 2024-08-12 09:49:09,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-12 09:49:14,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-08-12 09:49:21,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 12950, loss[loss=0.108, beats_loss=0.008127, ecapa_loss=0.0002507, whisper_loss=0.09734, over 16764.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01113, ecapa_loss=0.0001795, whisper_loss=0.09161, over 3872733.10 frames. ], batch size: 69, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:49:42,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1578860.0, ans=0.0 2024-08-12 09:50:02,110 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 09:50:09,307 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 09:50:16,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1579060.0, ans=0.0 2024-08-12 09:50:19,288 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 09:50:24,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.602e+01 2.996e+01 3.291e+01 5.195e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-12 09:50:28,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1579160.0, ans=0.125 2024-08-12 09:50:30,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2024-08-12 09:50:33,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13000, loss[loss=0.1014, beats_loss=0.009395, ecapa_loss=0.000157, whisper_loss=0.0904, over 15085.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01099, ecapa_loss=0.0001802, whisper_loss=0.09283, over 3913356.29 frames. ], batch size: 56, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:50:36,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-12 09:50:45,167 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 09:50:56,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1579360.0, ans=0.125 2024-08-12 09:50:59,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1579360.0, ans=0.125 2024-08-12 09:51:04,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2024-08-12 09:51:05,074 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-12 09:51:20,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.87 vs. limit=22.5 2024-08-12 09:51:20,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1579560.0, ans=0.125 2024-08-12 09:51:23,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1579560.0, ans=0.2 2024-08-12 09:51:31,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-12 09:51:36,454 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 09:51:36,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1579660.0, ans=0.125 2024-08-12 09:51:44,516 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13050, loss[loss=0.07589, beats_loss=0.01463, ecapa_loss=0.0002105, whisper_loss=0.05915, over 13911.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01103, ecapa_loss=0.00018, whisper_loss=0.09179, over 3867032.16 frames. ], batch size: 61, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:51:57,400 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 09:52:24,427 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 09:52:25,618 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 09:52:38,217 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 09:52:46,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.449e+01 2.683e+01 3.089e+01 1.742e+02, threshold=5.367e+01, percent-clipped=1.0 2024-08-12 09:52:49,616 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 09:52:54,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13100, loss[loss=0.1122, beats_loss=0.00973, ecapa_loss=0.0001647, whisper_loss=0.1008, over 23953.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001791, whisper_loss=0.09181, over 3860432.74 frames. ], batch size: 93, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:53:02,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1580260.0, ans=0.0 2024-08-12 09:53:28,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1580460.0, ans=0.125 2024-08-12 09:53:37,026 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 09:53:58,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1580660.0, ans=0.125 2024-08-12 09:54:02,351 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 09:54:05,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13150, loss[loss=0.1237, beats_loss=0.01013, ecapa_loss=0.0001735, whisper_loss=0.1119, over 20176.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001795, whisper_loss=0.09158, over 3875188.50 frames. ], batch size: 81, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:54:07,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1580760.0, ans=0.125 2024-08-12 09:54:41,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1580960.0, ans=0.125 2024-08-12 09:55:07,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.633e+01 2.862e+01 3.411e+01 5.758e+01, threshold=5.724e+01, percent-clipped=1.0 2024-08-12 09:55:16,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13200, loss[loss=0.08311, beats_loss=0.01245, ecapa_loss=0.0002029, whisper_loss=0.06863, over 16218.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001791, whisper_loss=0.09237, over 3859797.36 frames. ], batch size: 71, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:55:32,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1581360.0, ans=0.125 2024-08-12 09:55:42,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1581360.0, ans=0.125 2024-08-12 09:55:49,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1581460.0, ans=0.0 2024-08-12 09:55:53,797 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 09:55:59,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1581560.0, ans=0.0 2024-08-12 09:56:09,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1581560.0, ans=0.05 2024-08-12 09:56:27,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13250, loss[loss=0.1197, beats_loss=0.007734, ecapa_loss=0.0002344, whisper_loss=0.1096, over 15281.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001785, whisper_loss=0.09233, over 3872493.18 frames. ], batch size: 59, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:56:28,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1581760.0, ans=0.2 2024-08-12 09:56:29,552 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 09:56:44,521 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.051e-02 2024-08-12 09:56:45,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1581860.0, ans=0.0 2024-08-12 09:57:04,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1581960.0, ans=0.1 2024-08-12 09:57:10,870 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 09:57:15,075 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 09:57:15,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1582060.0, ans=0.125 2024-08-12 09:57:19,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1582060.0, ans=0.1 2024-08-12 09:57:29,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1582160.0, ans=0.0 2024-08-12 09:57:30,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.621e+01 2.894e+01 3.453e+01 5.158e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-12 09:57:38,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13300, loss[loss=0.09828, beats_loss=0.01267, ecapa_loss=0.0001868, whisper_loss=0.08375, over 21004.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01095, ecapa_loss=0.0001794, whisper_loss=0.09232, over 3892753.30 frames. ], batch size: 88, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:57:41,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1582260.0, ans=0.1 2024-08-12 09:57:42,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1582260.0, ans=0.1 2024-08-12 09:57:51,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=12.0 2024-08-12 09:57:54,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-12 09:58:10,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1582460.0, ans=0.125 2024-08-12 09:58:10,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2024-08-12 09:58:11,374 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-12 09:58:16,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1582460.0, ans=0.125 2024-08-12 09:58:22,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1582560.0, ans=0.2 2024-08-12 09:58:35,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=12.0 2024-08-12 09:58:37,930 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 09:58:49,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13350, loss[loss=0.09865, beats_loss=0.01212, ecapa_loss=0.0001718, whisper_loss=0.08481, over 18965.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01095, ecapa_loss=0.000179, whisper_loss=0.09181, over 3841034.15 frames. ], batch size: 77, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:59:04,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1582860.0, ans=0.125 2024-08-12 09:59:27,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1582960.0, ans=0.0 2024-08-12 09:59:34,052 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 09:59:39,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1583060.0, ans=0.125 2024-08-12 09:59:41,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1583060.0, ans=0.125 2024-08-12 09:59:42,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1583060.0, ans=0.0 2024-08-12 09:59:51,010 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:59:51,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.599e+01 2.960e+01 3.368e+01 5.094e+01, threshold=5.919e+01, percent-clipped=0.0 2024-08-12 09:59:52,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1583160.0, ans=0.125 2024-08-12 09:59:55,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-12 09:59:59,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-12 10:00:00,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13400, loss[loss=0.1153, beats_loss=0.01241, ecapa_loss=0.000211, whisper_loss=0.1008, over 20448.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001784, whisper_loss=0.09181, over 3845895.43 frames. ], batch size: 85, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:00:03,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1583260.0, ans=0.1 2024-08-12 10:00:03,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1583260.0, ans=0.125 2024-08-12 10:00:08,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2024-08-12 10:00:27,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1583460.0, ans=0.1 2024-08-12 10:00:39,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1583460.0, ans=0.2 2024-08-12 10:00:54,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1583560.0, ans=10.0 2024-08-12 10:01:10,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13450, loss[loss=0.1082, beats_loss=0.0134, ecapa_loss=0.0001319, whisper_loss=0.09352, over 23141.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001777, whisper_loss=0.09224, over 3848120.94 frames. ], batch size: 91, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:01:10,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-12 10:01:17,319 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 10:01:27,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1583860.0, ans=0.125 2024-08-12 10:01:36,666 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 10:01:38,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1583960.0, ans=0.125 2024-08-12 10:01:38,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1583960.0, ans=0.1 2024-08-12 10:01:42,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1583960.0, ans=0.0 2024-08-12 10:01:43,799 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 10:01:45,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1583960.0, ans=0.0 2024-08-12 10:01:46,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1583960.0, ans=0.0 2024-08-12 10:01:48,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1583960.0, ans=0.09899494936611666 2024-08-12 10:01:55,534 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 10:02:07,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1584160.0, ans=0.125 2024-08-12 10:02:11,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.426e+01 2.699e+01 3.096e+01 4.776e+01, threshold=5.398e+01, percent-clipped=0.0 2024-08-12 10:02:19,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13500, loss[loss=0.1067, beats_loss=0.01073, ecapa_loss=0.000153, whisper_loss=0.09447, over 20458.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.0001781, whisper_loss=0.09175, over 3851646.15 frames. ], batch size: 78, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:02:30,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1584260.0, ans=0.0 2024-08-12 10:02:31,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1584260.0, ans=0.2 2024-08-12 10:02:37,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=22.5 2024-08-12 10:03:02,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1584560.0, ans=0.125 2024-08-12 10:03:08,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1584560.0, ans=0.2 2024-08-12 10:03:09,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1584560.0, ans=0.0 2024-08-12 10:03:17,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1584660.0, ans=0.0 2024-08-12 10:03:22,769 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.240e-03 2024-08-12 10:03:30,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13550, loss[loss=0.09302, beats_loss=0.01148, ecapa_loss=0.0001748, whisper_loss=0.0798, over 21161.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01115, ecapa_loss=0.0001765, whisper_loss=0.09123, over 3866440.41 frames. ], batch size: 87, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:03:36,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1584760.0, ans=0.05 2024-08-12 10:04:24,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2024-08-12 10:04:31,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.628e+01 2.875e+01 3.352e+01 5.913e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 10:04:38,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1585160.0, ans=0.125 2024-08-12 10:04:40,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13600, loss[loss=0.1062, beats_loss=0.008961, ecapa_loss=0.0002339, whisper_loss=0.09488, over 17322.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01106, ecapa_loss=0.0001774, whisper_loss=0.09237, over 3888867.73 frames. ], batch size: 72, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:04:57,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1585360.0, ans=0.125 2024-08-12 10:05:22,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-12 10:05:38,425 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 10:05:44,749 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 10:05:48,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13650, loss[loss=0.1084, beats_loss=0.01162, ecapa_loss=0.000126, whisper_loss=0.09548, over 16824.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01106, ecapa_loss=0.0001772, whisper_loss=0.09292, over 3902148.11 frames. ], batch size: 62, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:06:14,292 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 10:06:16,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1585960.0, ans=0.125 2024-08-12 10:06:25,392 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 10:06:50,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.547e+01 2.720e+01 3.156e+01 5.627e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 10:06:53,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1586160.0, ans=0.125 2024-08-12 10:06:59,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13700, loss[loss=0.1397, beats_loss=0.008047, ecapa_loss=0.0001722, whisper_loss=0.1299, over 24267.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01096, ecapa_loss=0.0001771, whisper_loss=0.09377, over 3891848.36 frames. ], batch size: 91, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:07:19,154 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 10:08:00,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1586660.0, ans=0.0 2024-08-12 10:08:04,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-12 10:08:09,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13750, loss[loss=0.1017, beats_loss=0.01131, ecapa_loss=0.0001797, whisper_loss=0.08862, over 20610.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01094, ecapa_loss=0.0001775, whisper_loss=0.09283, over 3873286.83 frames. ], batch size: 84, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:08:27,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1586860.0, ans=0.2 2024-08-12 10:08:27,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-12 10:08:40,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1586960.0, ans=0.125 2024-08-12 10:08:42,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1586960.0, ans=0.0 2024-08-12 10:08:51,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1587060.0, ans=0.0 2024-08-12 10:09:00,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1587060.0, ans=0.125 2024-08-12 10:09:11,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.427e+01 2.784e+01 3.131e+01 5.573e+01, threshold=5.568e+01, percent-clipped=1.0 2024-08-12 10:09:19,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1587260.0, ans=0.2 2024-08-12 10:09:20,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13800, loss[loss=0.0888, beats_loss=0.01381, ecapa_loss=0.0001753, whisper_loss=0.07323, over 13592.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01095, ecapa_loss=0.0001775, whisper_loss=0.09312, over 3866339.13 frames. ], batch size: 56, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:09:43,649 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 10:10:20,262 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-12 10:10:22,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2024-08-12 10:10:32,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13850, loss[loss=0.1071, beats_loss=0.01185, ecapa_loss=0.0001691, whisper_loss=0.09358, over 22593.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01098, ecapa_loss=0.0001764, whisper_loss=0.09264, over 3862599.86 frames. ], batch size: 91, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:11:08,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1587960.0, ans=0.125 2024-08-12 10:11:12,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1587960.0, ans=0.2 2024-08-12 10:11:14,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1588060.0, ans=0.0 2024-08-12 10:11:19,685 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 10:11:31,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1588160.0, ans=0.0 2024-08-12 10:11:35,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.570e+01 2.844e+01 3.264e+01 2.322e+02, threshold=5.688e+01, percent-clipped=2.0 2024-08-12 10:11:38,137 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 10:11:44,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13900, loss[loss=0.08325, beats_loss=0.01471, ecapa_loss=0.0001492, whisper_loss=0.06705, over 14671.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01094, ecapa_loss=0.0001781, whisper_loss=0.0932, over 3862629.66 frames. ], batch size: 58, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:11:48,995 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 10:12:01,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1588360.0, ans=0.125 2024-08-12 10:12:23,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1588460.0, ans=0.0 2024-08-12 10:12:30,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1588560.0, ans=0.0 2024-08-12 10:12:40,778 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-12 10:13:00,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 13950, loss[loss=0.1168, beats_loss=0.01132, ecapa_loss=0.000169, whisper_loss=0.1038, over 22894.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01091, ecapa_loss=0.0001777, whisper_loss=0.09338, over 3894539.08 frames. ], batch size: 89, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:13:09,624 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 10:13:24,445 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 10:13:36,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1588960.0, ans=0.2 2024-08-12 10:13:37,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-08-12 10:13:38,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1588960.0, ans=0.09899494936611666 2024-08-12 10:13:40,067 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 10:13:43,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1588960.0, ans=0.125 2024-08-12 10:13:46,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1588960.0, ans=0.1 2024-08-12 10:13:57,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2024-08-12 10:14:08,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1589160.0, ans=0.125 2024-08-12 10:14:14,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.447e+01 2.683e+01 3.149e+01 1.029e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-12 10:14:20,899 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 10:14:23,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14000, loss[loss=0.1113, beats_loss=0.008703, ecapa_loss=0.0001998, whisper_loss=0.1006, over 18873.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01092, ecapa_loss=0.0001783, whisper_loss=0.09353, over 3907268.85 frames. ], batch size: 74, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:14:38,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1589260.0, ans=0.125 2024-08-12 10:14:54,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1589360.0, ans=0.2 2024-08-12 10:14:55,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1589460.0, ans=0.125 2024-08-12 10:14:57,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1589460.0, ans=0.1 2024-08-12 10:15:04,254 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 10:15:04,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1589460.0, ans=0.0 2024-08-12 10:15:21,250 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 10:15:41,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14050, loss[loss=0.109, beats_loss=0.01074, ecapa_loss=0.0001429, whisper_loss=0.09683, over 23331.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01097, ecapa_loss=0.0001778, whisper_loss=0.09291, over 3906182.04 frames. ], batch size: 90, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:15:50,562 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 10:15:50,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1589760.0, ans=0.125 2024-08-12 10:16:30,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1590060.0, ans=0.125 2024-08-12 10:16:54,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.972e+01 3.503e+01 4.652e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-12 10:17:03,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14100, loss[loss=0.1141, beats_loss=0.01233, ecapa_loss=0.00017, whisper_loss=0.1, over 22432.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0111, ecapa_loss=0.0001764, whisper_loss=0.09181, over 3874737.39 frames. ], batch size: 92, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:17:12,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1590260.0, ans=0.125 2024-08-12 10:17:21,553 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 10:17:27,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1590360.0, ans=0.0 2024-08-12 10:17:32,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-12 10:17:39,682 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 10:17:40,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1590460.0, ans=0.125 2024-08-12 10:17:54,885 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 10:17:58,399 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 10:18:19,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1590660.0, ans=0.2 2024-08-12 10:18:23,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14150, loss[loss=0.1153, beats_loss=0.01123, ecapa_loss=0.0001643, whisper_loss=0.1024, over 18926.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001769, whisper_loss=0.09235, over 3840760.53 frames. ], batch size: 73, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:18:33,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1590760.0, ans=0.2 2024-08-12 10:18:44,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1590860.0, ans=0.1 2024-08-12 10:18:49,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1590860.0, ans=0.07 2024-08-12 10:18:49,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1590860.0, ans=0.125 2024-08-12 10:19:15,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1591060.0, ans=0.125 2024-08-12 10:19:27,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1591060.0, ans=0.0 2024-08-12 10:19:34,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1591160.0, ans=0.2 2024-08-12 10:19:39,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.532e+01 2.801e+01 3.352e+01 7.282e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 10:19:49,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14200, loss[loss=0.1095, beats_loss=0.0113, ecapa_loss=0.0001732, whisper_loss=0.09645, over 16293.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01113, ecapa_loss=0.0001756, whisper_loss=0.0922, over 3891423.92 frames. ], batch size: 63, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:19:50,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-12 10:20:03,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1591260.0, ans=0.125 2024-08-12 10:20:06,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1591360.0, ans=0.1 2024-08-12 10:20:13,484 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 10:20:13,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1591360.0, ans=0.0 2024-08-12 10:20:16,659 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 10:20:20,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1591360.0, ans=0.0 2024-08-12 10:20:37,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=12.0 2024-08-12 10:20:40,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1591560.0, ans=0.2 2024-08-12 10:20:48,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1591560.0, ans=0.0 2024-08-12 10:21:00,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1591660.0, ans=0.0 2024-08-12 10:21:03,177 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 10:21:11,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-12 10:21:11,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14250, loss[loss=0.1111, beats_loss=0.01155, ecapa_loss=0.0001795, whisper_loss=0.09776, over 21130.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01113, ecapa_loss=0.000175, whisper_loss=0.09224, over 3915592.37 frames. ], batch size: 84, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:21:12,153 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 10:21:15,210 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 10:21:50,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-12 10:22:17,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1592160.0, ans=0.07 2024-08-12 10:22:21,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-12 10:22:22,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.447e+01 2.773e+01 3.183e+01 5.230e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 10:22:33,131 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14300, loss[loss=0.1158, beats_loss=0.009247, ecapa_loss=0.0001736, whisper_loss=0.1048, over 17737.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0111, ecapa_loss=0.0001764, whisper_loss=0.09175, over 3907363.28 frames. ], batch size: 70, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:22:38,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1592260.0, ans=0.2 2024-08-12 10:22:40,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1592260.0, ans=0.0 2024-08-12 10:22:40,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1592260.0, ans=0.125 2024-08-12 10:23:18,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1592460.0, ans=0.0 2024-08-12 10:23:19,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1592460.0, ans=0.1 2024-08-12 10:23:28,327 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:23:34,521 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 10:23:36,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1592560.0, ans=0.2 2024-08-12 10:23:38,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-08-12 10:23:55,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14350, loss[loss=0.1092, beats_loss=0.01144, ecapa_loss=0.0002017, whisper_loss=0.09577, over 22446.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.000177, whisper_loss=0.0921, over 3905457.24 frames. ], batch size: 91, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:24:13,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1592860.0, ans=0.125 2024-08-12 10:24:19,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2024-08-12 10:24:39,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1593060.0, ans=0.04949747468305833 2024-08-12 10:24:46,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.56 vs. limit=6.0 2024-08-12 10:24:53,124 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 10:24:59,332 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:25:00,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.480e+01 2.799e+01 3.080e+01 4.714e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 10:25:08,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14400, loss[loss=0.08188, beats_loss=0.01229, ecapa_loss=0.0001438, whisper_loss=0.06815, over 14114.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01095, ecapa_loss=0.0001794, whisper_loss=0.09295, over 3906884.53 frames. ], batch size: 56, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:25:08,601 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 10:25:35,431 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 10:26:12,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1593660.0, ans=0.125 2024-08-12 10:26:21,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 11, batch 14450, loss[loss=0.09303, beats_loss=0.01074, ecapa_loss=0.0001846, whisper_loss=0.08044, over 22742.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01095, ecapa_loss=0.0001795, whisper_loss=0.09265, over 3897045.11 frames. ], batch size: 93, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:26:22,091 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 10:26:23,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1593760.0, ans=0.125 2024-08-12 10:26:35,380 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.902e+00 2024-08-12 10:26:39,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1593860.0, ans=0.0 2024-08-12 10:26:49,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1593960.0, ans=0.125 2024-08-12 10:26:51,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1593960.0, ans=0.0 2024-08-12 10:27:01,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1593960.0, ans=0.0 2024-08-12 10:27:48,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 0, loss[loss=0.1357, beats_loss=0.01029, ecapa_loss=0.0001386, whisper_loss=0.124, over 24663.00 frames. ], tot_loss[loss=0.1357, beats_loss=0.01029, ecapa_loss=0.0001386, whisper_loss=0.124, over 24663.00 frames. ], batch size: 89, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:27:48,658 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 10:28:26,846 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on ASR_libri: loss=0.2553, beats_loss=0, ecapa_loss=0.0005949, whisper_loss=0.2493, over 922467.00 frames. 2024-08-12 10:28:43,336 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on SV_voxceleb1: loss=0.004912, beats_loss=0, ecapa_loss=0.0004912, whisper_loss=0, over 939242.00 frames. 2024-08-12 10:30:40,446 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on AT_audioset: loss=0.02433, beats_loss=0.02433, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 10:30:40,453 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 10:30:46,461 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 10:30:59,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1594110.0, ans=0.125 2024-08-12 10:30:59,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.491e+01 2.893e+01 3.197e+01 9.364e+01, threshold=5.786e+01, percent-clipped=1.0 2024-08-12 10:31:05,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1594210.0, ans=0.0 2024-08-12 10:31:32,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1594310.0, ans=0.125 2024-08-12 10:32:24,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 50, loss[loss=0.1095, beats_loss=0.007808, ecapa_loss=0.0002167, whisper_loss=0.0995, over 16988.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01009, ecapa_loss=0.0001887, whisper_loss=0.0914, over 883735.19 frames. ], batch size: 68, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:32:30,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1594610.0, ans=0.1 2024-08-12 10:32:44,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1594710.0, ans=0.0 2024-08-12 10:32:57,467 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 10:32:57,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1594710.0, ans=0.125 2024-08-12 10:33:25,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1594810.0, ans=0.04949747468305833 2024-08-12 10:33:37,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1594910.0, ans=0.125 2024-08-12 10:33:46,232 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 10:34:07,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1595010.0, ans=0.1 2024-08-12 10:34:14,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 100, loss[loss=0.1009, beats_loss=0.01112, ecapa_loss=0.0001226, whisper_loss=0.0886, over 19618.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0102, ecapa_loss=0.000181, whisper_loss=0.09266, over 1567896.58 frames. ], batch size: 72, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:34:17,662 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 10:34:27,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1595110.0, ans=0.1 2024-08-12 10:34:34,588 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.774e+01 3.018e+01 3.442e+01 6.372e+01, threshold=6.036e+01, percent-clipped=2.0 2024-08-12 10:35:21,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1595410.0, ans=0.125 2024-08-12 10:35:21,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1595410.0, ans=0.0 2024-08-12 10:35:23,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1595410.0, ans=0.125 2024-08-12 10:36:03,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 150, loss[loss=0.1055, beats_loss=0.01117, ecapa_loss=0.0001523, whisper_loss=0.09285, over 20727.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01019, ecapa_loss=0.0001768, whisper_loss=0.09313, over 2083239.10 frames. ], batch size: 79, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:36:25,371 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:36:26,628 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 10:36:32,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-12 10:36:37,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1595710.0, ans=0.125 2024-08-12 10:36:40,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1595810.0, ans=0.125 2024-08-12 10:36:49,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1595810.0, ans=0.1 2024-08-12 10:36:59,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1595910.0, ans=0.0 2024-08-12 10:37:00,924 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-12 10:37:02,977 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 10:37:05,041 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 10:37:05,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1595910.0, ans=15.0 2024-08-12 10:37:06,289 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 10:37:18,098 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 10:37:25,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1596010.0, ans=0.0 2024-08-12 10:37:31,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 200, loss[loss=0.1043, beats_loss=0.01113, ecapa_loss=0.0001484, whisper_loss=0.09169, over 23792.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01038, ecapa_loss=0.0001768, whisper_loss=0.09213, over 2460911.25 frames. ], batch size: 94, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:37:49,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.723e+01 2.993e+01 3.587e+01 5.466e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-12 10:38:03,649 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 10:38:10,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1596310.0, ans=0.0 2024-08-12 10:38:17,199 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 10:38:27,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1596410.0, ans=0.125 2024-08-12 10:38:31,684 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-12 10:38:47,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1596510.0, ans=0.125 2024-08-12 10:38:54,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1596510.0, ans=0.125 2024-08-12 10:38:59,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 250, loss[loss=0.09177, beats_loss=0.01385, ecapa_loss=0.0001559, whisper_loss=0.07636, over 19896.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01053, ecapa_loss=0.0001752, whisper_loss=0.09257, over 2759286.60 frames. ], batch size: 83, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:39:20,808 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 10:39:21,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1596710.0, ans=0.125 2024-08-12 10:39:29,257 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:39:35,308 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 10:39:42,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1596810.0, ans=10.0 2024-08-12 10:39:49,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1596910.0, ans=0.2 2024-08-12 10:39:59,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1596910.0, ans=0.125 2024-08-12 10:39:59,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1596910.0, ans=0.2 2024-08-12 10:40:07,508 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 10:40:19,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 300, loss[loss=0.1072, beats_loss=0.01125, ecapa_loss=0.0001435, whisper_loss=0.09448, over 16411.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001756, whisper_loss=0.09182, over 2997873.97 frames. ], batch size: 63, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:40:29,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2024-08-12 10:40:34,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.533e+01 2.859e+01 3.181e+01 4.204e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 10:40:37,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-12 10:41:00,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1597310.0, ans=0.125 2024-08-12 10:41:02,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1597310.0, ans=0.125 2024-08-12 10:41:06,198 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 34 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 10:41:11,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1597410.0, ans=0.125 2024-08-12 10:41:19,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1597410.0, ans=0.0 2024-08-12 10:41:24,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1597510.0, ans=0.0 2024-08-12 10:41:39,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 350, loss[loss=0.09999, beats_loss=0.01196, ecapa_loss=0.0002063, whisper_loss=0.08597, over 21471.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01087, ecapa_loss=0.000174, whisper_loss=0.09058, over 3173975.21 frames. ], batch size: 88, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:42:30,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1597910.0, ans=0.125 2024-08-12 10:42:33,069 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 10:42:34,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1597910.0, ans=0.0 2024-08-12 10:42:35,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.10 vs. limit=22.5 2024-08-12 10:42:36,075 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 10:42:42,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1598010.0, ans=0.2 2024-08-12 10:42:47,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-12 10:42:52,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-12 10:42:54,903 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 10:42:56,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-12 10:42:57,154 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 400, loss[loss=0.1155, beats_loss=0.01088, ecapa_loss=0.0001708, whisper_loss=0.1029, over 22538.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.000173, whisper_loss=0.09056, over 3291769.19 frames. ], batch size: 91, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:43:02,577 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 10:43:11,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.512e+01 2.716e+01 3.145e+01 4.909e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-12 10:43:39,493 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 10:43:50,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1598410.0, ans=0.125 2024-08-12 10:43:51,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1598410.0, ans=0.0 2024-08-12 10:44:10,065 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 10:44:11,516 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-12 10:44:11,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1598510.0, ans=0.09899494936611666 2024-08-12 10:44:15,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 450, loss[loss=0.08996, beats_loss=0.01174, ecapa_loss=0.0001491, whisper_loss=0.07673, over 16784.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01093, ecapa_loss=0.0001724, whisper_loss=0.0906, over 3402337.38 frames. ], batch size: 64, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:44:16,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1598610.0, ans=0.0 2024-08-12 10:44:19,088 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 10:44:22,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.74 vs. limit=22.5 2024-08-12 10:44:58,576 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 9 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 10:45:10,231 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 10:45:19,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1599010.0, ans=0.125 2024-08-12 10:45:32,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-08-12 10:45:32,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 500, loss[loss=0.09682, beats_loss=0.01281, ecapa_loss=0.0001448, whisper_loss=0.08256, over 19627.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01102, ecapa_loss=0.0001722, whisper_loss=0.08971, over 3463296.64 frames. ], batch size: 78, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:45:39,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1599110.0, ans=0.0 2024-08-12 10:45:46,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.446e+01 2.825e+01 3.305e+01 5.621e+01, threshold=5.651e+01, percent-clipped=2.0 2024-08-12 10:45:50,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1599210.0, ans=0.125 2024-08-12 10:45:51,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1599210.0, ans=0.1 2024-08-12 10:45:53,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-12 10:46:05,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1599310.0, ans=0.0 2024-08-12 10:46:20,321 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 10:46:22,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1599410.0, ans=0.125 2024-08-12 10:46:46,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1599510.0, ans=0.125 2024-08-12 10:46:50,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1599510.0, ans=0.0 2024-08-12 10:46:52,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 550, loss[loss=0.1137, beats_loss=0.009242, ecapa_loss=0.0001845, whisper_loss=0.1026, over 17592.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01095, ecapa_loss=0.000172, whisper_loss=0.0905, over 3569719.81 frames. ], batch size: 69, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:46:53,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1599610.0, ans=0.125 2024-08-12 10:47:05,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1599610.0, ans=0.125 2024-08-12 10:47:26,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1599810.0, ans=0.0 2024-08-12 10:47:29,669 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 10:47:35,764 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 10:47:53,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1599910.0, ans=0.1 2024-08-12 10:48:14,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 600, loss[loss=0.0962, beats_loss=0.01134, ecapa_loss=0.0001717, whisper_loss=0.08315, over 15168.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01097, ecapa_loss=0.0001708, whisper_loss=0.09081, over 3648140.59 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:48:15,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1600110.0, ans=0.125 2024-08-12 10:48:28,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1600110.0, ans=15.0 2024-08-12 10:48:28,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.536e+01 2.795e+01 3.405e+01 6.348e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-12 10:48:56,797 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 10:49:26,955 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 10:49:31,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 650, loss[loss=0.09642, beats_loss=0.0107, ecapa_loss=0.0001687, whisper_loss=0.08404, over 19924.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01087, ecapa_loss=0.0001719, whisper_loss=0.09054, over 3679357.14 frames. ], batch size: 79, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:49:31,805 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 10:49:40,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1600610.0, ans=0.125 2024-08-12 10:49:40,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.67 vs. limit=22.5 2024-08-12 10:49:47,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1600710.0, ans=0.1 2024-08-12 10:49:48,745 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 10:50:08,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1600810.0, ans=0.04949747468305833 2024-08-12 10:50:15,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2024-08-12 10:50:52,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 700, loss[loss=0.0946, beats_loss=0.01416, ecapa_loss=0.0001736, whisper_loss=0.0787, over 20779.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01088, ecapa_loss=0.0001723, whisper_loss=0.09024, over 3717267.77 frames. ], batch size: 89, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:50:54,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1601110.0, ans=0.125 2024-08-12 10:50:55,906 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 10:50:56,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1601110.0, ans=0.125 2024-08-12 10:51:06,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.429e+01 2.647e+01 2.906e+01 4.054e+01, threshold=5.293e+01, percent-clipped=0.0 2024-08-12 10:51:07,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1601210.0, ans=0.125 2024-08-12 10:51:10,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1601210.0, ans=0.125 2024-08-12 10:51:12,864 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 10:51:22,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1601310.0, ans=0.0 2024-08-12 10:51:44,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1601410.0, ans=0.0 2024-08-12 10:51:51,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2024-08-12 10:51:52,326 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 10:52:09,322 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 30 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-12 10:52:10,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=15.0 2024-08-12 10:52:10,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 750, loss[loss=0.1421, beats_loss=0.00659, ecapa_loss=0.0002183, whisper_loss=0.1333, over 16190.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01093, ecapa_loss=0.0001713, whisper_loss=0.09026, over 3738951.81 frames. ], batch size: 63, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:52:26,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.69 vs. limit=22.5 2024-08-12 10:52:27,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1601710.0, ans=0.125 2024-08-12 10:52:30,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1601710.0, ans=15.0 2024-08-12 10:52:51,240 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 10:52:54,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1601810.0, ans=0.1 2024-08-12 10:53:05,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1601910.0, ans=0.125 2024-08-12 10:53:05,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2024-08-12 10:53:19,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1602010.0, ans=0.0 2024-08-12 10:53:26,946 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 30 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 10:53:29,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 800, loss[loss=0.1209, beats_loss=0.007522, ecapa_loss=0.0001573, whisper_loss=0.1118, over 20690.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01098, ecapa_loss=0.0001717, whisper_loss=0.09027, over 3770910.84 frames. ], batch size: 78, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:53:45,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.463e+01 2.797e+01 3.235e+01 6.542e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 10:53:49,095 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 10:53:58,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1602210.0, ans=0.0 2024-08-12 10:54:04,898 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 10:54:06,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2024-08-12 10:54:09,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1602310.0, ans=0.0 2024-08-12 10:54:09,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1602310.0, ans=10.0 2024-08-12 10:54:09,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1602310.0, ans=0.125 2024-08-12 10:54:14,210 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 10:54:34,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1602510.0, ans=0.1 2024-08-12 10:54:36,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1602510.0, ans=0.125 2024-08-12 10:54:47,593 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 10:54:50,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 850, loss[loss=0.07716, beats_loss=0.01304, ecapa_loss=0.0002021, whisper_loss=0.0621, over 14152.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01099, ecapa_loss=0.0001701, whisper_loss=0.08968, over 3748094.64 frames. ], batch size: 59, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:54:59,950 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 10:55:04,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2024-08-12 10:55:22,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-12 10:55:28,172 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 10:55:58,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1603010.0, ans=0.125 2024-08-12 10:56:00,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-12 10:56:08,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1603110.0, ans=0.1 2024-08-12 10:56:09,051 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 900, loss[loss=0.1155, beats_loss=0.01053, ecapa_loss=0.0001516, whisper_loss=0.1034, over 23764.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01094, ecapa_loss=0.0001714, whisper_loss=0.0897, over 3734554.31 frames. ], batch size: 90, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:56:14,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1603110.0, ans=0.125 2024-08-12 10:56:16,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1603110.0, ans=0.125 2024-08-12 10:56:29,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.446e+01 2.685e+01 3.025e+01 4.659e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 10:56:30,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1603210.0, ans=0.125 2024-08-12 10:56:37,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1603210.0, ans=0.125 2024-08-12 10:57:03,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1603410.0, ans=0.0 2024-08-12 10:57:33,398 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 950, loss[loss=0.09671, beats_loss=0.009521, ecapa_loss=0.000171, whisper_loss=0.08548, over 16773.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01096, ecapa_loss=0.0001709, whisper_loss=0.09006, over 3741040.02 frames. ], batch size: 64, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:58:18,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1603810.0, ans=0.2 2024-08-12 10:58:33,583 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 10:58:44,397 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 10:58:55,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1604010.0, ans=0.0 2024-08-12 10:59:01,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1604010.0, ans=0.0 2024-08-12 10:59:05,532 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 10:59:10,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1000, loss[loss=0.08361, beats_loss=0.01386, ecapa_loss=0.0001663, whisper_loss=0.06809, over 18276.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01102, ecapa_loss=0.0001705, whisper_loss=0.08991, over 3756664.87 frames. ], batch size: 76, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:59:20,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1604110.0, ans=0.1 2024-08-12 10:59:32,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.574e+01 2.849e+01 3.275e+01 5.377e+01, threshold=5.697e+01, percent-clipped=1.0 2024-08-12 10:59:51,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1604310.0, ans=0.125 2024-08-12 10:59:51,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1604310.0, ans=0.0 2024-08-12 10:59:56,186 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 11:00:15,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1604410.0, ans=0.125 2024-08-12 11:00:18,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2024-08-12 11:00:26,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1604410.0, ans=0.125 2024-08-12 11:00:52,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2024-08-12 11:00:56,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1604510.0, ans=0.1 2024-08-12 11:00:58,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1050, loss[loss=0.1133, beats_loss=0.01025, ecapa_loss=0.000126, whisper_loss=0.1017, over 15570.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01113, ecapa_loss=0.0001695, whisper_loss=0.08983, over 3784015.56 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:01:04,421 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.250e-01 2024-08-12 11:01:23,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1604710.0, ans=0.125 2024-08-12 11:01:28,519 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 11:01:34,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=12.0 2024-08-12 11:01:50,807 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 11:02:56,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1605010.0, ans=0.125 2024-08-12 11:03:01,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1100, loss[loss=0.1002, beats_loss=0.01072, ecapa_loss=0.0001641, whisper_loss=0.08787, over 19578.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01101, ecapa_loss=0.0001697, whisper_loss=0.09071, over 3776165.11 frames. ], batch size: 78, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:03:06,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1605110.0, ans=0.2 2024-08-12 11:03:26,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1605210.0, ans=0.0 2024-08-12 11:03:27,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.546e+01 2.827e+01 3.274e+01 5.638e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 11:03:29,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1605210.0, ans=0.0 2024-08-12 11:03:46,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1605210.0, ans=0.125 2024-08-12 11:05:01,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=12.0 2024-08-12 11:05:08,797 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 11:05:09,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1150, loss[loss=0.09498, beats_loss=0.01005, ecapa_loss=0.0001764, whisper_loss=0.08317, over 18757.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001695, whisper_loss=0.09133, over 3792063.89 frames. ], batch size: 75, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:05:23,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1605610.0, ans=0.1 2024-08-12 11:05:34,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1605610.0, ans=0.1 2024-08-12 11:05:36,097 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 11:05:44,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1605710.0, ans=0.125 2024-08-12 11:05:55,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1605710.0, ans=0.125 2024-08-12 11:06:14,046 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 11:06:15,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1605810.0, ans=0.0 2024-08-12 11:06:49,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1605910.0, ans=0.0 2024-08-12 11:07:04,466 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 11:07:09,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1606010.0, ans=0.125 2024-08-12 11:07:11,329 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 11:07:14,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1200, loss[loss=0.09766, beats_loss=0.01289, ecapa_loss=0.0001352, whisper_loss=0.08342, over 19159.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01094, ecapa_loss=0.0001705, whisper_loss=0.09188, over 3807412.15 frames. ], batch size: 75, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:07:15,582 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 11:07:26,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1606110.0, ans=0.2 2024-08-12 11:07:28,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1606110.0, ans=0.125 2024-08-12 11:07:36,810 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.362e+01 2.610e+01 2.988e+01 4.824e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-12 11:07:38,616 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 18 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 11:07:55,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1606210.0, ans=0.0 2024-08-12 11:08:32,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-12 11:08:49,725 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 28 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 11:08:50,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1606510.0, ans=0.125 2024-08-12 11:09:03,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1250, loss[loss=0.0848, beats_loss=0.01202, ecapa_loss=0.000137, whisper_loss=0.07141, over 18073.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.000169, whisper_loss=0.09165, over 3811656.42 frames. ], batch size: 69, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:09:03,355 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 11:09:08,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1606610.0, ans=0.125 2024-08-12 11:09:17,487 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 11:09:20,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1606710.0, ans=0.125 2024-08-12 11:09:29,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2024-08-12 11:09:35,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1606810.0, ans=0.0 2024-08-12 11:09:39,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1606810.0, ans=0.125 2024-08-12 11:09:48,845 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 11:10:18,156 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 11:10:28,166 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1300, loss[loss=0.121, beats_loss=0.01058, ecapa_loss=0.000124, whisper_loss=0.1092, over 19661.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01101, ecapa_loss=0.0001696, whisper_loss=0.09143, over 3815748.34 frames. ], batch size: 72, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:10:38,684 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 11:10:44,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.467e+01 2.705e+01 3.116e+01 5.074e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-12 11:10:56,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-08-12 11:11:21,677 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 11:11:40,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1607510.0, ans=0.125 2024-08-12 11:11:49,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1350, loss[loss=0.09908, beats_loss=0.01159, ecapa_loss=0.0001536, whisper_loss=0.08595, over 14663.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001707, whisper_loss=0.09132, over 3821526.58 frames. ], batch size: 59, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:12:04,615 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 11:12:26,103 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 11:12:32,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1607810.0, ans=0.125 2024-08-12 11:12:44,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.30 vs. limit=10.0 2024-08-12 11:13:11,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1400, loss[loss=0.1201, beats_loss=0.00995, ecapa_loss=0.000149, whisper_loss=0.1086, over 20631.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01101, ecapa_loss=0.0001703, whisper_loss=0.09137, over 3836152.22 frames. ], batch size: 77, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:13:21,965 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 11:13:22,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1608110.0, ans=0.125 2024-08-12 11:13:22,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2024-08-12 11:13:27,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.408e+01 2.816e+01 3.296e+01 5.087e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-12 11:13:50,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-08-12 11:14:09,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1608410.0, ans=0.125 2024-08-12 11:14:15,369 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 11:14:15,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1608410.0, ans=0.1 2024-08-12 11:14:27,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1608510.0, ans=0.125 2024-08-12 11:14:27,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1608510.0, ans=0.125 2024-08-12 11:14:34,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1608610.0, ans=0.0 2024-08-12 11:14:34,802 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1450, loss[loss=0.08466, beats_loss=0.01336, ecapa_loss=0.0001246, whisper_loss=0.07006, over 18574.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.011, ecapa_loss=0.0001697, whisper_loss=0.09094, over 3817795.73 frames. ], batch size: 71, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:15:20,261 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 11:15:29,314 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 11:15:33,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1608810.0, ans=0.125 2024-08-12 11:15:52,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1608910.0, ans=0.125 2024-08-12 11:16:04,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1609010.0, ans=0.035 2024-08-12 11:16:11,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.27 vs. limit=22.5 2024-08-12 11:16:22,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1500, loss[loss=0.098, beats_loss=0.008666, ecapa_loss=0.000181, whisper_loss=0.08753, over 19994.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01096, ecapa_loss=0.0001702, whisper_loss=0.09024, over 3807594.78 frames. ], batch size: 76, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:16:24,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1609110.0, ans=0.1 2024-08-12 11:16:37,631 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:16:38,279 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.429e+01 2.735e+01 3.054e+01 5.898e+01, threshold=5.470e+01, percent-clipped=1.0 2024-08-12 11:16:50,026 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 11:16:55,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.13 vs. limit=10.0 2024-08-12 11:17:06,255 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 11:17:22,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1609410.0, ans=0.125 2024-08-12 11:17:52,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1550, loss[loss=0.09941, beats_loss=0.009367, ecapa_loss=0.0001907, whisper_loss=0.08814, over 17114.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01094, ecapa_loss=0.0001705, whisper_loss=0.09014, over 3821241.79 frames. ], batch size: 70, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:17:57,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1609610.0, ans=0.0 2024-08-12 11:18:18,973 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 11:18:26,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1609810.0, ans=0.1 2024-08-12 11:18:44,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1609910.0, ans=0.125 2024-08-12 11:18:54,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1609910.0, ans=0.0 2024-08-12 11:19:08,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-12 11:19:15,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1610010.0, ans=0.125 2024-08-12 11:19:17,743 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 11:19:19,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1600, loss[loss=0.1216, beats_loss=0.01154, ecapa_loss=0.0001606, whisper_loss=0.1085, over 17249.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01094, ecapa_loss=0.0001692, whisper_loss=0.09067, over 3821279.69 frames. ], batch size: 67, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:19:20,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.05 vs. limit=22.5 2024-08-12 11:19:21,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=12.0 2024-08-12 11:19:36,515 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.594e+01 2.878e+01 3.251e+01 6.117e+01, threshold=5.756e+01, percent-clipped=2.0 2024-08-12 11:19:40,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1610210.0, ans=0.125 2024-08-12 11:20:10,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1610410.0, ans=0.05 2024-08-12 11:20:11,949 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 25 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-12 11:20:18,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2024-08-12 11:20:23,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2024-08-12 11:20:30,911 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 11:20:44,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1610610.0, ans=0.2 2024-08-12 11:20:45,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1650, loss[loss=0.1159, beats_loss=0.01227, ecapa_loss=0.0001582, whisper_loss=0.102, over 22496.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01098, ecapa_loss=0.0001696, whisper_loss=0.09041, over 3818377.70 frames. ], batch size: 88, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:21:03,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1610710.0, ans=0.0 2024-08-12 11:21:19,326 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 11:21:24,429 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 11:21:35,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=15.0 2024-08-12 11:22:08,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1700, loss[loss=0.1123, beats_loss=0.008865, ecapa_loss=0.0001687, whisper_loss=0.1018, over 23588.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01088, ecapa_loss=0.0001697, whisper_loss=0.09122, over 3839800.26 frames. ], batch size: 93, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:22:11,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1611110.0, ans=0.2 2024-08-12 11:22:23,094 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 26 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-12 11:22:24,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.487e+01 2.798e+01 3.265e+01 1.299e+02, threshold=5.596e+01, percent-clipped=2.0 2024-08-12 11:22:36,125 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 11:23:04,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1611410.0, ans=0.125 2024-08-12 11:23:05,593 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 11:23:13,546 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 11:23:18,933 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 11:23:29,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1750, loss[loss=0.1086, beats_loss=0.01183, ecapa_loss=0.0001455, whisper_loss=0.09533, over 22028.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001691, whisper_loss=0.09111, over 3871158.26 frames. ], batch size: 85, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:23:33,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1611610.0, ans=0.0 2024-08-12 11:23:34,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2024-08-12 11:23:35,125 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 11:24:08,611 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 17 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 11:24:19,587 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 11:24:33,892 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 11:24:48,696 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 11:24:49,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1800, loss[loss=0.1086, beats_loss=0.01023, ecapa_loss=0.0001773, whisper_loss=0.09659, over 21696.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001701, whisper_loss=0.09094, over 3885506.31 frames. ], batch size: 86, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:24:50,085 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 11:25:00,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1612110.0, ans=0.125 2024-08-12 11:25:05,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.474e+01 2.742e+01 2.995e+01 4.904e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-12 11:25:14,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-08-12 11:25:21,630 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 11:25:25,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1612310.0, ans=0.125 2024-08-12 11:25:29,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1612310.0, ans=0.125 2024-08-12 11:25:29,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-12 11:25:36,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1612410.0, ans=0.0 2024-08-12 11:25:58,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1612510.0, ans=0.2 2024-08-12 11:26:14,757 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1850, loss[loss=0.118, beats_loss=0.009382, ecapa_loss=0.0001703, whisper_loss=0.107, over 16305.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0001704, whisper_loss=0.09183, over 3874732.44 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:26:32,594 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 11:26:37,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1612710.0, ans=0.95 2024-08-12 11:26:49,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.90 vs. limit=22.5 2024-08-12 11:26:50,904 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 11:27:30,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.24 vs. limit=22.5 2024-08-12 11:27:40,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1613010.0, ans=10.0 2024-08-12 11:27:56,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1613010.0, ans=0.0 2024-08-12 11:28:04,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1900, loss[loss=0.09667, beats_loss=0.009102, ecapa_loss=0.000156, whisper_loss=0.08601, over 15026.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001705, whisper_loss=0.09199, over 3843939.93 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:28:06,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-12 11:28:26,582 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.546e+01 2.864e+01 3.475e+01 5.350e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-12 11:28:41,404 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 11:29:03,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1613310.0, ans=0.0 2024-08-12 11:29:27,160 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 11:29:28,682 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 11:29:44,742 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 1950, loss[loss=0.09838, beats_loss=0.01141, ecapa_loss=0.000215, whisper_loss=0.08483, over 22227.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001714, whisper_loss=0.0911, over 3829929.24 frames. ], batch size: 93, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:29:50,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1613610.0, ans=0.0 2024-08-12 11:30:08,655 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.171e+00 2024-08-12 11:30:24,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1613810.0, ans=0.2 2024-08-12 11:30:30,145 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 11:30:46,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1613910.0, ans=0.0 2024-08-12 11:30:59,650 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 11:31:05,513 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2000, loss[loss=0.09728, beats_loss=0.01307, ecapa_loss=0.0001413, whisper_loss=0.0828, over 23174.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001723, whisper_loss=0.09104, over 3856369.62 frames. ], batch size: 93, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:31:05,674 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 11:31:15,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2024-08-12 11:31:20,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.474e+01 2.700e+01 3.035e+01 6.607e+01, threshold=5.401e+01, percent-clipped=2.0 2024-08-12 11:31:25,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1614210.0, ans=0.2 2024-08-12 11:31:34,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1614210.0, ans=0.0 2024-08-12 11:31:41,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1614310.0, ans=0.0 2024-08-12 11:31:41,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1614310.0, ans=0.0 2024-08-12 11:31:59,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1614410.0, ans=0.0 2024-08-12 11:32:02,489 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-12 11:32:04,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.81 vs. limit=10.0 2024-08-12 11:32:07,579 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 11:32:24,765 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2050, loss[loss=0.09396, beats_loss=0.008701, ecapa_loss=0.0002389, whisper_loss=0.08287, over 15720.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01093, ecapa_loss=0.0001743, whisper_loss=0.0908, over 3862744.45 frames. ], batch size: 68, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:32:35,101 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 11:32:49,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1614710.0, ans=0.95 2024-08-12 11:32:52,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1614710.0, ans=0.125 2024-08-12 11:33:33,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-08-12 11:33:38,258 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 11:33:42,831 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 11:33:46,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2100, loss[loss=0.1279, beats_loss=0.0117, ecapa_loss=0.0001481, whisper_loss=0.1147, over 15460.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01105, ecapa_loss=0.0001728, whisper_loss=0.09055, over 3845730.82 frames. ], batch size: 59, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:33:49,691 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 11:33:57,836 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 37 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 11:34:02,324 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.515e+01 2.855e+01 3.226e+01 9.750e+01, threshold=5.709e+01, percent-clipped=2.0 2024-08-12 11:34:03,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1615210.0, ans=0.0 2024-08-12 11:34:25,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2024-08-12 11:34:36,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1615410.0, ans=0.0 2024-08-12 11:34:48,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1615510.0, ans=0.1 2024-08-12 11:34:50,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1615510.0, ans=0.025 2024-08-12 11:34:59,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1615510.0, ans=0.1 2024-08-12 11:35:04,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2150, loss[loss=0.08374, beats_loss=0.01439, ecapa_loss=0.0001396, whisper_loss=0.06796, over 16976.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01112, ecapa_loss=0.0001723, whisper_loss=0.09021, over 3829583.23 frames. ], batch size: 66, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:35:14,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1615610.0, ans=0.1 2024-08-12 11:35:32,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1615710.0, ans=0.0 2024-08-12 11:35:47,142 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:35:47,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1615810.0, ans=0.0 2024-08-12 11:35:53,076 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 11:36:02,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=22.5 2024-08-12 11:36:19,341 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 11:36:23,476 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2200, loss[loss=0.09217, beats_loss=0.01458, ecapa_loss=0.000139, whisper_loss=0.07619, over 17811.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01113, ecapa_loss=0.0001718, whisper_loss=0.09065, over 3817356.31 frames. ], batch size: 70, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:36:37,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=12.0 2024-08-12 11:36:40,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.491e+01 2.779e+01 3.104e+01 1.679e+02, threshold=5.558e+01, percent-clipped=1.0 2024-08-12 11:36:48,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1616210.0, ans=0.0 2024-08-12 11:36:55,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-12 11:37:05,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1616310.0, ans=0.125 2024-08-12 11:37:09,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.41 vs. limit=6.0 2024-08-12 11:37:15,185 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 11:37:15,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1616410.0, ans=0.125 2024-08-12 11:37:18,100 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 11:37:21,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1616410.0, ans=0.0 2024-08-12 11:37:31,010 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 11:37:44,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2250, loss[loss=0.1005, beats_loss=0.01244, ecapa_loss=0.0001917, whisper_loss=0.08611, over 21904.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01105, ecapa_loss=0.000173, whisper_loss=0.09195, over 3837679.58 frames. ], batch size: 93, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:37:50,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=12.0 2024-08-12 11:37:59,392 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 26 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 11:38:00,688 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 11:38:10,724 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-12 11:38:36,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1616910.0, ans=0.0 2024-08-12 11:38:37,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1616910.0, ans=0.1 2024-08-12 11:38:37,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1616910.0, ans=0.2 2024-08-12 11:38:46,540 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 11:39:03,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.68 vs. limit=15.0 2024-08-12 11:39:05,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2300, loss[loss=0.09776, beats_loss=0.0111, ecapa_loss=0.0001381, whisper_loss=0.08529, over 18628.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01111, ecapa_loss=0.0001733, whisper_loss=0.09153, over 3877482.46 frames. ], batch size: 69, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:39:08,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1617110.0, ans=0.2 2024-08-12 11:39:09,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1617110.0, ans=0.0 2024-08-12 11:39:10,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2024-08-12 11:39:22,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.578e+01 2.776e+01 3.127e+01 7.036e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 11:39:24,412 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 11:39:40,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-12 11:39:43,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2024-08-12 11:39:53,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-12 11:39:56,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2024-08-12 11:40:02,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2024-08-12 11:40:21,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1617510.0, ans=0.125 2024-08-12 11:40:25,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2350, loss[loss=0.1195, beats_loss=0.008894, ecapa_loss=0.0001715, whisper_loss=0.1089, over 18006.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01103, ecapa_loss=0.0001738, whisper_loss=0.09252, over 3894313.30 frames. ], batch size: 71, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:40:30,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1617610.0, ans=0.05 2024-08-12 11:40:37,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2024-08-12 11:40:42,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1617710.0, ans=0.0 2024-08-12 11:41:16,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1617910.0, ans=0.0 2024-08-12 11:41:47,754 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2400, loss[loss=0.08952, beats_loss=0.0108, ecapa_loss=0.0001523, whisper_loss=0.07719, over 15733.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01102, ecapa_loss=0.0001731, whisper_loss=0.09238, over 3890885.38 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:41:49,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1618110.0, ans=0.0 2024-08-12 11:42:03,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.471e+01 2.708e+01 3.082e+01 4.957e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-12 11:42:17,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1618310.0, ans=0.125 2024-08-12 11:42:22,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1618310.0, ans=0.125 2024-08-12 11:42:57,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1618510.0, ans=0.125 2024-08-12 11:43:06,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2450, loss[loss=0.1089, beats_loss=0.01068, ecapa_loss=0.0001634, whisper_loss=0.09662, over 22157.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001738, whisper_loss=0.09213, over 3878691.34 frames. ], batch size: 88, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:43:19,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1618610.0, ans=0.2 2024-08-12 11:43:20,514 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 11:43:23,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-12 11:43:32,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1618710.0, ans=0.1 2024-08-12 11:43:34,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2024-08-12 11:44:13,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1618910.0, ans=0.0 2024-08-12 11:44:18,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1618910.0, ans=0.0 2024-08-12 11:44:49,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2500, loss[loss=0.104, beats_loss=0.008766, ecapa_loss=0.0001711, whisper_loss=0.09348, over 14943.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001742, whisper_loss=0.09213, over 3887594.06 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:44:52,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1619110.0, ans=0.0 2024-08-12 11:44:57,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1619110.0, ans=0.0 2024-08-12 11:45:08,630 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 11:45:10,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.514e+01 2.796e+01 3.106e+01 8.282e+01, threshold=5.592e+01, percent-clipped=2.0 2024-08-12 11:45:39,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1619310.0, ans=0.2 2024-08-12 11:45:51,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1619310.0, ans=0.2 2024-08-12 11:45:51,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-12 11:45:59,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1619410.0, ans=0.125 2024-08-12 11:45:59,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1619410.0, ans=0.1 2024-08-12 11:46:08,140 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 11:46:39,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2550, loss[loss=0.1177, beats_loss=0.01107, ecapa_loss=0.0001526, whisper_loss=0.1051, over 23040.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001748, whisper_loss=0.09173, over 3890223.42 frames. ], batch size: 91, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:46:59,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-08-12 11:47:12,154 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 36 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 11:47:26,915 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 11:47:51,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1620010.0, ans=0.125 2024-08-12 11:48:05,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2600, loss[loss=0.121, beats_loss=0.01057, ecapa_loss=0.0001394, whisper_loss=0.109, over 20208.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001744, whisper_loss=0.092, over 3877559.51 frames. ], batch size: 75, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:48:21,137 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.606e+01 2.871e+01 3.471e+01 6.871e+01, threshold=5.743e+01, percent-clipped=3.0 2024-08-12 11:48:47,085 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 11:48:54,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1620410.0, ans=22.5 2024-08-12 11:49:02,779 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 11:49:03,139 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.076e-02 2024-08-12 11:49:11,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1620510.0, ans=0.125 2024-08-12 11:49:11,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1620510.0, ans=0.0 2024-08-12 11:49:22,146 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 11:49:24,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2650, loss[loss=0.1009, beats_loss=0.01081, ecapa_loss=0.0001618, whisper_loss=0.08845, over 18134.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001751, whisper_loss=0.09169, over 3879925.37 frames. ], batch size: 73, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:49:32,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1620610.0, ans=0.1 2024-08-12 11:49:43,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1620710.0, ans=0.1 2024-08-12 11:49:44,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.92 vs. limit=10.0 2024-08-12 11:49:56,266 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 11:50:01,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1620810.0, ans=0.1 2024-08-12 11:50:04,251 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 11:50:22,791 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 11:50:23,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-12 11:50:42,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2700, loss[loss=0.1183, beats_loss=0.009213, ecapa_loss=0.0002201, whisper_loss=0.1069, over 18442.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001742, whisper_loss=0.0921, over 3888915.99 frames. ], batch size: 77, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:50:49,031 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 11:50:52,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1621110.0, ans=0.1 2024-08-12 11:50:58,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.509e+01 2.801e+01 3.158e+01 4.809e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 11:51:00,746 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-12 11:51:02,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1621210.0, ans=0.0 2024-08-12 11:51:02,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1621210.0, ans=0.0 2024-08-12 11:51:07,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=15.0 2024-08-12 11:51:10,030 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 11:51:38,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1621410.0, ans=0.125 2024-08-12 11:51:45,764 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 11:51:46,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1621510.0, ans=0.125 2024-08-12 11:52:02,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2750, loss[loss=0.09188, beats_loss=0.01128, ecapa_loss=0.0001942, whisper_loss=0.07866, over 16359.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.000174, whisper_loss=0.09231, over 3897756.45 frames. ], batch size: 66, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:52:12,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=6.0 2024-08-12 11:52:16,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1621610.0, ans=0.0 2024-08-12 11:52:19,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1621710.0, ans=10.0 2024-08-12 11:52:24,208 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 11:52:30,217 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 11:53:22,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2800, loss[loss=0.1137, beats_loss=0.01147, ecapa_loss=0.000143, whisper_loss=0.1008, over 22524.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01097, ecapa_loss=0.0001746, whisper_loss=0.09254, over 3871902.65 frames. ], batch size: 88, lr: 5.33e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:53:37,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.464e+01 2.680e+01 3.068e+01 4.016e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-12 11:53:38,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1622210.0, ans=0.0 2024-08-12 11:54:00,249 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 11:54:16,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1622410.0, ans=0.0 2024-08-12 11:54:37,104 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-12 11:54:43,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2850, loss[loss=0.1059, beats_loss=0.01285, ecapa_loss=0.0001402, whisper_loss=0.09166, over 23333.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01098, ecapa_loss=0.0001738, whisper_loss=0.09266, over 3869973.10 frames. ], batch size: 92, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:55:09,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1622710.0, ans=0.125 2024-08-12 11:55:16,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1622810.0, ans=0.09899494936611666 2024-08-12 11:55:43,971 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 11:55:56,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1623010.0, ans=0.2 2024-08-12 11:56:05,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2900, loss[loss=0.0817, beats_loss=0.01143, ecapa_loss=0.0002049, whisper_loss=0.06822, over 13884.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01098, ecapa_loss=0.0001758, whisper_loss=0.09208, over 3876927.64 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:56:09,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1623110.0, ans=0.1 2024-08-12 11:56:20,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.518e+01 2.817e+01 3.035e+01 4.423e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-12 11:56:21,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1623210.0, ans=0.125 2024-08-12 11:56:35,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1623210.0, ans=0.125 2024-08-12 11:56:53,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.22 vs. limit=22.5 2024-08-12 11:57:00,255 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 11:57:16,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1623510.0, ans=0.0 2024-08-12 11:57:18,047 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 11:57:25,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 2950, loss[loss=0.1162, beats_loss=0.01091, ecapa_loss=0.0001913, whisper_loss=0.1034, over 19781.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01105, ecapa_loss=0.0001775, whisper_loss=0.09142, over 3848273.63 frames. ], batch size: 81, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:57:25,337 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 11:57:36,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=12.0 2024-08-12 11:58:18,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1623910.0, ans=0.1 2024-08-12 11:58:19,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-12 11:58:21,377 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 11:58:23,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1623910.0, ans=0.125 2024-08-12 11:58:24,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1623910.0, ans=0.0 2024-08-12 11:58:29,422 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 11:58:44,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3000, loss[loss=0.1077, beats_loss=0.01061, ecapa_loss=0.0001685, whisper_loss=0.09541, over 19470.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01098, ecapa_loss=0.0001765, whisper_loss=0.09237, over 3879747.85 frames. ], batch size: 74, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:58:44,649 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 11:59:25,767 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on ASR_libri: loss=0.256, beats_loss=0, ecapa_loss=0.0005941, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 11:59:44,970 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on SV_voxceleb1: loss=0.00471, beats_loss=0, ecapa_loss=0.000471, whisper_loss=0, over 939242.00 frames. 2024-08-12 12:00:30,999 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1076, 3.3546, 2.3253, 3.8700], device='cuda:3') 2024-08-12 12:01:46,938 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on AT_audioset: loss=0.02429, beats_loss=0.02429, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 12:01:46,941 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 12:01:49,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-12 12:02:03,128 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.760e-01 2024-08-12 12:02:03,786 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.553e+01 2.970e+01 3.483e+01 4.771e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-12 12:02:17,141 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 12:02:26,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1624310.0, ans=0.125 2024-08-12 12:02:43,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-12 12:02:48,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1624510.0, ans=0.1 2024-08-12 12:02:51,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1624510.0, ans=0.1 2024-08-12 12:03:04,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-12 12:03:05,074 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3050, loss[loss=0.09878, beats_loss=0.01048, ecapa_loss=0.0001339, whisper_loss=0.08695, over 18746.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01093, ecapa_loss=0.0001765, whisper_loss=0.09328, over 3922069.72 frames. ], batch size: 71, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:03:34,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-08-12 12:03:48,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1624810.0, ans=0.125 2024-08-12 12:03:51,673 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 12:03:57,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1624910.0, ans=0.1 2024-08-12 12:04:25,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3100, loss[loss=0.08571, beats_loss=0.01368, ecapa_loss=0.0001503, whisper_loss=0.07053, over 17963.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01095, ecapa_loss=0.0001781, whisper_loss=0.09322, over 3901263.31 frames. ], batch size: 74, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:04:25,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1625110.0, ans=0.1 2024-08-12 12:04:36,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-12 12:04:42,993 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.511e+01 2.833e+01 3.211e+01 6.314e+01, threshold=5.667e+01, percent-clipped=1.0 2024-08-12 12:04:45,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1625210.0, ans=0.125 2024-08-12 12:04:56,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-12 12:05:10,175 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 12:05:18,216 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 12:05:43,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3150, loss[loss=0.1056, beats_loss=0.01193, ecapa_loss=0.000191, whisper_loss=0.09179, over 21790.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01092, ecapa_loss=0.0001793, whisper_loss=0.09326, over 3898288.25 frames. ], batch size: 91, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:06:02,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1625710.0, ans=0.0 2024-08-12 12:06:03,614 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 12:06:06,777 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-12 12:06:13,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1625710.0, ans=0.1 2024-08-12 12:06:19,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1625810.0, ans=0.125 2024-08-12 12:06:34,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1625910.0, ans=0.125 2024-08-12 12:06:37,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1625910.0, ans=0.0 2024-08-12 12:06:38,804 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 12:06:49,925 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 12:07:03,458 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3200, loss[loss=0.08988, beats_loss=0.01258, ecapa_loss=0.0001625, whisper_loss=0.07567, over 21055.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01093, ecapa_loss=0.0001769, whisper_loss=0.09368, over 3912990.83 frames. ], batch size: 84, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:07:04,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1626110.0, ans=0.125 2024-08-12 12:07:16,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1626110.0, ans=0.125 2024-08-12 12:07:21,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.397e+01 2.776e+01 3.062e+01 4.690e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 12:07:35,138 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 12:07:41,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1626310.0, ans=0.1 2024-08-12 12:07:52,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1626410.0, ans=0.125 2024-08-12 12:08:04,586 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 12:08:22,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1626610.0, ans=0.125 2024-08-12 12:08:22,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3250, loss[loss=0.1093, beats_loss=0.009088, ecapa_loss=0.000177, whisper_loss=0.09846, over 15492.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01099, ecapa_loss=0.0001754, whisper_loss=0.09287, over 3883799.44 frames. ], batch size: 62, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:08:52,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1626810.0, ans=0.1 2024-08-12 12:09:10,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1626910.0, ans=0.0 2024-08-12 12:09:10,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1626910.0, ans=0.1 2024-08-12 12:09:42,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3300, loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001671, whisper_loss=0.08974, over 21314.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01108, ecapa_loss=0.0001751, whisper_loss=0.09207, over 3848816.83 frames. ], batch size: 83, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:09:45,384 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 12:09:58,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.622e+01 3.065e+01 3.686e+01 1.090e+02, threshold=6.129e+01, percent-clipped=1.0 2024-08-12 12:10:12,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1627310.0, ans=0.1 2024-08-12 12:10:29,322 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 12:10:40,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1627410.0, ans=0.125 2024-08-12 12:10:44,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1627510.0, ans=0.125 2024-08-12 12:10:46,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1627510.0, ans=0.125 2024-08-12 12:10:54,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1627510.0, ans=0.125 2024-08-12 12:10:59,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3350, loss[loss=0.09771, beats_loss=0.01004, ecapa_loss=0.0002043, whisper_loss=0.08563, over 13281.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01104, ecapa_loss=0.0001748, whisper_loss=0.09246, over 3859133.35 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:11:34,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1627810.0, ans=0.035 2024-08-12 12:11:37,675 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 12:11:47,205 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 12:11:49,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1627910.0, ans=0.125 2024-08-12 12:11:58,287 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 12:12:17,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3400, loss[loss=0.09324, beats_loss=0.01197, ecapa_loss=0.0001584, whisper_loss=0.07968, over 19257.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01112, ecapa_loss=0.0001754, whisper_loss=0.09101, over 3873224.76 frames. ], batch size: 78, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:12:35,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.455e+01 2.782e+01 3.017e+01 1.106e+02, threshold=5.563e+01, percent-clipped=1.0 2024-08-12 12:12:46,422 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 12:12:53,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1628310.0, ans=0.0 2024-08-12 12:12:54,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1628310.0, ans=0.1 2024-08-12 12:13:00,436 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 12:13:27,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1628510.0, ans=0.1 2024-08-12 12:13:30,355 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 12:13:36,107 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3450, loss[loss=0.09755, beats_loss=0.00862, ecapa_loss=0.0001983, whisper_loss=0.08694, over 18981.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01115, ecapa_loss=0.0001748, whisper_loss=0.09089, over 3879133.73 frames. ], batch size: 77, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:13:37,632 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 18 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-12 12:13:41,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1628610.0, ans=0.0 2024-08-12 12:13:52,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1628710.0, ans=0.0 2024-08-12 12:13:59,561 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 12:14:21,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1628910.0, ans=0.09899494936611666 2024-08-12 12:14:23,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-12 12:14:36,292 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 12:14:48,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=1629010.0, ans=12.0 2024-08-12 12:14:53,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3500, loss[loss=0.1002, beats_loss=0.01446, ecapa_loss=0.0001972, whisper_loss=0.08373, over 21052.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01116, ecapa_loss=0.0001757, whisper_loss=0.09068, over 3900886.03 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:15:10,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.491e+01 2.788e+01 3.215e+01 5.809e+01, threshold=5.577e+01, percent-clipped=2.0 2024-08-12 12:15:13,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1629210.0, ans=0.5 2024-08-12 12:15:14,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1629210.0, ans=0.1 2024-08-12 12:15:18,768 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 27 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 12:15:23,957 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 12:15:27,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-12 12:15:30,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.64 vs. limit=22.5 2024-08-12 12:15:43,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1629410.0, ans=0.025 2024-08-12 12:15:45,481 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 12:15:47,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1629410.0, ans=0.125 2024-08-12 12:15:51,857 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 12:16:12,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3550, loss[loss=0.103, beats_loss=0.009964, ecapa_loss=0.0001615, whisper_loss=0.09144, over 18635.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01112, ecapa_loss=0.000175, whisper_loss=0.09066, over 3898505.63 frames. ], batch size: 71, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:16:17,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1629610.0, ans=0.125 2024-08-12 12:16:21,108 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 12:16:30,752 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 12:16:35,433 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 12:16:43,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1629810.0, ans=0.125 2024-08-12 12:16:45,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1629810.0, ans=0.125 2024-08-12 12:16:53,652 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 12:17:18,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1630010.0, ans=0.125 2024-08-12 12:17:26,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-08-12 12:17:28,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3600, loss[loss=0.1104, beats_loss=0.009464, ecapa_loss=0.0001447, whisper_loss=0.09945, over 17369.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01115, ecapa_loss=0.0001751, whisper_loss=0.09039, over 3872339.31 frames. ], batch size: 63, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:17:29,142 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 12:17:36,331 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 12:17:45,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 2.537e+01 2.866e+01 3.271e+01 6.335e+01, threshold=5.732e+01, percent-clipped=1.0 2024-08-12 12:17:45,552 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 12:17:58,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1630310.0, ans=0.2 2024-08-12 12:18:25,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1630410.0, ans=0.1 2024-08-12 12:18:26,555 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:18:46,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3650, loss[loss=0.1126, beats_loss=0.0107, ecapa_loss=0.0001623, whisper_loss=0.1003, over 20790.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01109, ecapa_loss=0.0001753, whisper_loss=0.09087, over 3846660.99 frames. ], batch size: 79, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:18:47,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=15.0 2024-08-12 12:19:07,788 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 27 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 12:19:12,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1630710.0, ans=0.125 2024-08-12 12:19:13,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1630710.0, ans=0.125 2024-08-12 12:19:16,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-12 12:19:19,762 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 12:19:33,537 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-12 12:19:57,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1631010.0, ans=0.125 2024-08-12 12:19:59,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1631010.0, ans=0.125 2024-08-12 12:20:05,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3700, loss[loss=0.09563, beats_loss=0.01295, ecapa_loss=0.000164, whisper_loss=0.08104, over 16499.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01108, ecapa_loss=0.0001761, whisper_loss=0.0905, over 3855737.21 frames. ], batch size: 68, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:20:23,218 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.690e+01 3.090e+01 3.461e+01 6.737e+01, threshold=6.180e+01, percent-clipped=1.0 2024-08-12 12:20:28,712 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:20:42,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1631310.0, ans=0.0 2024-08-12 12:20:43,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1631310.0, ans=0.125 2024-08-12 12:20:54,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1631410.0, ans=0.0 2024-08-12 12:21:03,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1631410.0, ans=0.0 2024-08-12 12:21:04,639 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 12:21:09,149 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 12:21:09,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1631510.0, ans=0.1 2024-08-12 12:21:24,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3750, loss[loss=0.1094, beats_loss=0.01106, ecapa_loss=0.0001134, whisper_loss=0.09724, over 21545.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01103, ecapa_loss=0.0001763, whisper_loss=0.09117, over 3860918.88 frames. ], batch size: 79, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:21:27,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-12 12:21:30,168 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 12:21:32,911 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 12:21:55,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1631810.0, ans=0.05 2024-08-12 12:22:17,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1631910.0, ans=0.0 2024-08-12 12:22:24,137 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 12:22:37,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1632010.0, ans=0.0 2024-08-12 12:22:44,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3800, loss[loss=0.09953, beats_loss=0.01054, ecapa_loss=0.0001846, whisper_loss=0.08714, over 15434.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01103, ecapa_loss=0.0001768, whisper_loss=0.09126, over 3870320.97 frames. ], batch size: 61, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:22:54,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1632110.0, ans=0.2 2024-08-12 12:22:58,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1632110.0, ans=0.0 2024-08-12 12:23:02,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.541e+01 2.857e+01 3.346e+01 7.613e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 12:23:03,953 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 12:23:10,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1632210.0, ans=0.1 2024-08-12 12:23:21,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-12 12:23:30,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2024-08-12 12:23:35,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-12 12:23:38,958 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 12:23:46,543 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-12 12:23:48,258 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 12:23:54,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1632510.0, ans=0.0 2024-08-12 12:24:01,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3850, loss[loss=0.1331, beats_loss=0.009385, ecapa_loss=0.0001873, whisper_loss=0.1219, over 24183.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01103, ecapa_loss=0.000178, whisper_loss=0.09177, over 3875133.15 frames. ], batch size: 93, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:24:12,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1632610.0, ans=0.0 2024-08-12 12:24:44,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1632810.0, ans=0.1 2024-08-12 12:24:52,095 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 12:24:53,452 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 12:25:17,931 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:25:21,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1633110.0, ans=0.125 2024-08-12 12:25:22,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3900, loss[loss=0.09682, beats_loss=0.01202, ecapa_loss=0.0001907, whisper_loss=0.08289, over 21546.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001795, whisper_loss=0.09185, over 3906571.75 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:25:35,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1633110.0, ans=0.125 2024-08-12 12:25:39,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.512e+01 2.803e+01 3.159e+01 7.102e+01, threshold=5.607e+01, percent-clipped=1.0 2024-08-12 12:25:56,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1633310.0, ans=0.0 2024-08-12 12:26:14,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1633410.0, ans=0.2 2024-08-12 12:26:26,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1633510.0, ans=0.0 2024-08-12 12:26:31,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1633510.0, ans=0.125 2024-08-12 12:26:41,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 3950, loss[loss=0.1106, beats_loss=0.01083, ecapa_loss=0.0001527, whisper_loss=0.09821, over 22679.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01106, ecapa_loss=0.0001798, whisper_loss=0.09253, over 3896316.47 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:26:47,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1633610.0, ans=0.2 2024-08-12 12:27:05,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1633710.0, ans=0.125 2024-08-12 12:27:11,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1633710.0, ans=0.2 2024-08-12 12:27:12,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1633810.0, ans=0.0 2024-08-12 12:27:21,650 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 12:27:29,392 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 12:27:33,787 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 12:27:36,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-12 12:27:47,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1634010.0, ans=0.0 2024-08-12 12:27:49,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1634010.0, ans=0.125 2024-08-12 12:27:54,543 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:27:58,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-12 12:27:59,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1634110.0, ans=0.125 2024-08-12 12:28:00,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4000, loss[loss=0.1122, beats_loss=0.01089, ecapa_loss=0.0001452, whisper_loss=0.09981, over 21359.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01098, ecapa_loss=0.0001801, whisper_loss=0.09278, over 3906144.76 frames. ], batch size: 82, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:28:11,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1634110.0, ans=0.125 2024-08-12 12:28:12,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1634110.0, ans=0.04949747468305833 2024-08-12 12:28:16,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.585e+01 2.882e+01 3.381e+01 6.617e+01, threshold=5.764e+01, percent-clipped=3.0 2024-08-12 12:28:17,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1634210.0, ans=0.125 2024-08-12 12:28:32,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1634310.0, ans=0.125 2024-08-12 12:28:37,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1634310.0, ans=0.0 2024-08-12 12:28:43,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1634310.0, ans=0.1 2024-08-12 12:29:00,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1634410.0, ans=0.125 2024-08-12 12:29:11,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=22.5 2024-08-12 12:29:11,963 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 12:29:18,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4050, loss[loss=0.11, beats_loss=0.01269, ecapa_loss=0.0001464, whisper_loss=0.09589, over 22973.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0109, ecapa_loss=0.0001795, whisper_loss=0.09381, over 3903732.11 frames. ], batch size: 91, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:29:24,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1634610.0, ans=0.0 2024-08-12 12:29:46,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1634710.0, ans=0.1 2024-08-12 12:29:48,860 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 12:29:52,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1634810.0, ans=0.0 2024-08-12 12:29:55,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2024-08-12 12:30:33,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1635010.0, ans=0.125 2024-08-12 12:30:39,534 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4100, loss[loss=0.1046, beats_loss=0.01191, ecapa_loss=0.000136, whisper_loss=0.09132, over 15431.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01091, ecapa_loss=0.0001793, whisper_loss=0.09414, over 3881542.59 frames. ], batch size: 58, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:30:56,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.490e+01 2.729e+01 3.052e+01 9.662e+01, threshold=5.458e+01, percent-clipped=1.0 2024-08-12 12:31:08,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1635210.0, ans=0.0 2024-08-12 12:31:16,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1635310.0, ans=0.125 2024-08-12 12:31:26,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1635410.0, ans=0.0 2024-08-12 12:31:32,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1635410.0, ans=0.0 2024-08-12 12:31:47,453 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 12:32:00,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4150, loss[loss=0.08379, beats_loss=0.01272, ecapa_loss=0.0001375, whisper_loss=0.06969, over 15276.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0109, ecapa_loss=0.0001799, whisper_loss=0.09353, over 3873309.75 frames. ], batch size: 59, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:32:12,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-12 12:32:23,036 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 12:32:31,732 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-12 12:32:33,252 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 12:32:43,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-12 12:32:46,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-12 12:32:59,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1635910.0, ans=0.1 2024-08-12 12:32:59,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1635910.0, ans=0.125 2024-08-12 12:33:18,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1636010.0, ans=0.0 2024-08-12 12:33:20,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4200, loss[loss=0.08, beats_loss=0.01333, ecapa_loss=0.000148, whisper_loss=0.06519, over 16530.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001779, whisper_loss=0.09297, over 3882601.55 frames. ], batch size: 64, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:33:31,892 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.687e-01 2024-08-12 12:33:35,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1636210.0, ans=0.2 2024-08-12 12:33:36,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1636210.0, ans=0.125 2024-08-12 12:33:37,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.467e+01 2.734e+01 3.043e+01 4.289e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 12:33:39,287 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 12:33:48,453 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 12:34:04,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.43 vs. limit=10.0 2024-08-12 12:34:20,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-08-12 12:34:23,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1636510.0, ans=0.0 2024-08-12 12:34:39,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4250, loss[loss=0.1017, beats_loss=0.01017, ecapa_loss=0.0001982, whisper_loss=0.08958, over 15241.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01099, ecapa_loss=0.0001776, whisper_loss=0.09308, over 3891836.90 frames. ], batch size: 63, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:34:43,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1636610.0, ans=0.2 2024-08-12 12:34:53,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1636610.0, ans=0.1 2024-08-12 12:34:54,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1636710.0, ans=0.035 2024-08-12 12:35:04,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-12 12:35:09,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1636810.0, ans=0.125 2024-08-12 12:35:26,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1636910.0, ans=0.125 2024-08-12 12:35:27,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1636910.0, ans=0.0 2024-08-12 12:35:43,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1637010.0, ans=0.1 2024-08-12 12:35:49,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1637010.0, ans=0.0 2024-08-12 12:35:58,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4300, loss[loss=0.09019, beats_loss=0.01193, ecapa_loss=0.0002051, whisper_loss=0.07621, over 18792.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01098, ecapa_loss=0.0001775, whisper_loss=0.09283, over 3868230.80 frames. ], batch size: 80, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:36:09,708 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 12:36:13,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1637210.0, ans=0.1 2024-08-12 12:36:14,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.07 vs. limit=15.0 2024-08-12 12:36:15,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.507e+01 2.747e+01 3.144e+01 4.891e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 12:36:37,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1637310.0, ans=15.0 2024-08-12 12:36:40,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1637310.0, ans=0.125 2024-08-12 12:37:08,471 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:37:16,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4350, loss[loss=0.1111, beats_loss=0.01306, ecapa_loss=0.0001408, whisper_loss=0.09668, over 19786.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01095, ecapa_loss=0.0001789, whisper_loss=0.09204, over 3851995.48 frames. ], batch size: 78, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:37:22,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1637610.0, ans=0.0 2024-08-12 12:37:30,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1637610.0, ans=0.125 2024-08-12 12:37:37,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-12 12:37:43,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1637710.0, ans=0.125 2024-08-12 12:37:58,042 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-12 12:38:06,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2024-08-12 12:38:09,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=15.0 2024-08-12 12:38:13,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1637910.0, ans=0.125 2024-08-12 12:38:15,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1637910.0, ans=0.125 2024-08-12 12:38:16,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1637910.0, ans=0.0 2024-08-12 12:38:35,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1638110.0, ans=0.0 2024-08-12 12:38:36,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4400, loss[loss=0.1203, beats_loss=0.01171, ecapa_loss=0.0001473, whisper_loss=0.1071, over 20914.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001784, whisper_loss=0.09173, over 3864985.80 frames. ], batch size: 79, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:38:55,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.518e+01 2.794e+01 3.242e+01 9.315e+01, threshold=5.589e+01, percent-clipped=2.0 2024-08-12 12:38:58,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1638210.0, ans=0.125 2024-08-12 12:39:34,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1638410.0, ans=0.125 2024-08-12 12:39:38,954 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:39:58,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1638610.0, ans=0.125 2024-08-12 12:39:59,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4450, loss[loss=0.1206, beats_loss=0.009017, ecapa_loss=0.0001946, whisper_loss=0.1096, over 21411.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01094, ecapa_loss=0.0001798, whisper_loss=0.09172, over 3885987.47 frames. ], batch size: 86, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:40:02,758 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 12:40:04,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.28 vs. limit=15.0 2024-08-12 12:40:18,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1638710.0, ans=0.0 2024-08-12 12:40:18,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1638710.0, ans=0.125 2024-08-12 12:40:21,130 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 12:40:49,791 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 12:40:56,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1638910.0, ans=10.0 2024-08-12 12:41:05,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1639010.0, ans=0.0 2024-08-12 12:41:13,381 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 12:41:13,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1639010.0, ans=0.125 2024-08-12 12:41:19,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4500, loss[loss=0.0965, beats_loss=0.01201, ecapa_loss=0.0002249, whisper_loss=0.08224, over 21041.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001801, whisper_loss=0.09192, over 3861523.23 frames. ], batch size: 89, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:41:37,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.496e+01 3.007e+01 3.529e+01 6.889e+01, threshold=6.014e+01, percent-clipped=3.0 2024-08-12 12:41:59,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1639310.0, ans=0.0 2024-08-12 12:41:59,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.22 vs. limit=10.0 2024-08-12 12:42:06,716 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 12:42:14,454 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 12:42:20,636 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 12:42:38,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4550, loss[loss=0.08315, beats_loss=0.01427, ecapa_loss=0.0001618, whisper_loss=0.06727, over 13782.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.0001786, whisper_loss=0.09182, over 3877750.03 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:42:45,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1639610.0, ans=0.125 2024-08-12 12:42:59,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1639710.0, ans=0.0 2024-08-12 12:43:09,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1639810.0, ans=0.125 2024-08-12 12:43:09,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1639810.0, ans=0.125 2024-08-12 12:43:29,137 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 12:43:43,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1640010.0, ans=0.0 2024-08-12 12:43:48,209 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 12:43:57,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4600, loss[loss=0.0959, beats_loss=0.01021, ecapa_loss=0.0001822, whisper_loss=0.08387, over 22115.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001783, whisper_loss=0.09153, over 3875447.64 frames. ], batch size: 91, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:44:06,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2024-08-12 12:44:14,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.449e+01 2.715e+01 3.086e+01 6.580e+01, threshold=5.431e+01, percent-clipped=1.0 2024-08-12 12:44:14,684 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 12:44:29,486 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 12:44:36,388 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 12:44:49,332 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 12:45:04,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1640510.0, ans=0.1 2024-08-12 12:45:16,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4650, loss[loss=0.09614, beats_loss=0.01122, ecapa_loss=0.0002282, whisper_loss=0.08263, over 14667.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0109, ecapa_loss=0.0001771, whisper_loss=0.09289, over 3869176.94 frames. ], batch size: 57, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:45:17,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1640610.0, ans=0.1 2024-08-12 12:45:29,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.18 vs. limit=22.5 2024-08-12 12:45:42,181 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 12:45:59,290 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 12:46:01,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1640810.0, ans=0.125 2024-08-12 12:46:03,930 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-12 12:46:04,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2024-08-12 12:46:17,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1641010.0, ans=0.1 2024-08-12 12:46:23,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1641010.0, ans=0.1 2024-08-12 12:46:24,273 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 12:46:36,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4700, loss[loss=0.1036, beats_loss=0.009855, ecapa_loss=0.0001919, whisper_loss=0.09178, over 22144.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.000176, whisper_loss=0.09231, over 3842596.25 frames. ], batch size: 91, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:46:48,073 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 26 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-12 12:46:50,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1641110.0, ans=0.09899494936611666 2024-08-12 12:46:54,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.509e+01 2.776e+01 3.112e+01 6.525e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 12:47:00,826 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 12:47:46,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1641510.0, ans=0.0 2024-08-12 12:47:52,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1641510.0, ans=0.2 2024-08-12 12:47:52,806 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.168e+05 2024-08-12 12:47:55,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4750, loss[loss=0.1143, beats_loss=0.007426, ecapa_loss=0.0001905, whisper_loss=0.105, over 14954.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01088, ecapa_loss=0.000175, whisper_loss=0.09289, over 3849689.43 frames. ], batch size: 58, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:48:02,846 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 12:48:28,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1641810.0, ans=0.125 2024-08-12 12:48:44,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1641910.0, ans=0.025 2024-08-12 12:48:48,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1641910.0, ans=0.09899494936611666 2024-08-12 12:49:10,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4800, loss[loss=0.1076, beats_loss=0.01047, ecapa_loss=0.0001714, whisper_loss=0.09546, over 22344.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01096, ecapa_loss=0.0001763, whisper_loss=0.09297, over 3873681.94 frames. ], batch size: 89, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:49:14,956 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 12:49:28,543 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.489e+01 2.813e+01 3.178e+01 7.863e+01, threshold=5.627e+01, percent-clipped=2.0 2024-08-12 12:49:44,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1642310.0, ans=0.125 2024-08-12 12:49:51,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1642310.0, ans=0.125 2024-08-12 12:49:55,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1642310.0, ans=0.125 2024-08-12 12:49:58,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1642410.0, ans=0.125 2024-08-12 12:50:28,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4850, loss[loss=0.09391, beats_loss=0.008937, ecapa_loss=0.0002173, whisper_loss=0.0828, over 15375.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001787, whisper_loss=0.09238, over 3874923.58 frames. ], batch size: 61, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:50:35,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.97 vs. limit=10.0 2024-08-12 12:50:39,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1642610.0, ans=0.1 2024-08-12 12:51:01,758 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 12:51:07,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1642810.0, ans=0.125 2024-08-12 12:51:26,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1642910.0, ans=0.1 2024-08-12 12:51:46,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1643110.0, ans=0.125 2024-08-12 12:51:47,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4900, loss[loss=0.1112, beats_loss=0.01211, ecapa_loss=0.0001888, whisper_loss=0.09724, over 22672.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01097, ecapa_loss=0.0001777, whisper_loss=0.09303, over 3893138.91 frames. ], batch size: 93, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:52:03,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.565e+01 2.777e+01 3.230e+01 5.434e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 12:52:09,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1643210.0, ans=0.0 2024-08-12 12:52:10,225 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 12:52:13,046 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 12:52:23,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1643310.0, ans=0.125 2024-08-12 12:52:28,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-08-12 12:52:32,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1643410.0, ans=0.125 2024-08-12 12:52:58,167 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 12:53:02,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 4950, loss[loss=0.1272, beats_loss=0.01154, ecapa_loss=0.0001542, whisper_loss=0.1141, over 18251.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01097, ecapa_loss=0.0001767, whisper_loss=0.09273, over 3855807.72 frames. ], batch size: 72, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:53:08,710 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-12 12:53:18,102 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 12:53:26,551 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 12:53:49,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1643910.0, ans=0.0 2024-08-12 12:53:56,406 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 12:54:01,903 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 12:54:03,620 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 12:54:20,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5000, loss[loss=0.09373, beats_loss=0.009541, ecapa_loss=0.0001654, whisper_loss=0.08253, over 19971.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0001767, whisper_loss=0.09215, over 3897396.11 frames. ], batch size: 80, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:54:26,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1644110.0, ans=0.0 2024-08-12 12:54:36,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.385e+01 2.734e+01 3.105e+01 6.733e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-12 12:54:48,301 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 12:55:08,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1644410.0, ans=0.05 2024-08-12 12:55:30,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1644510.0, ans=0.2 2024-08-12 12:55:37,808 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5050, loss[loss=0.1027, beats_loss=0.01168, ecapa_loss=0.0001709, whisper_loss=0.08931, over 21592.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001762, whisper_loss=0.09221, over 3911790.16 frames. ], batch size: 90, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:55:54,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1644710.0, ans=0.125 2024-08-12 12:56:10,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1644810.0, ans=0.025 2024-08-12 12:56:21,848 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 12:56:23,221 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 12:56:45,086 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 12:56:45,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1645010.0, ans=0.0 2024-08-12 12:56:56,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5100, loss[loss=0.1091, beats_loss=0.01158, ecapa_loss=0.0001228, whisper_loss=0.09631, over 17064.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001758, whisper_loss=0.09243, over 3916556.54 frames. ], batch size: 63, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:57:00,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1645110.0, ans=0.1 2024-08-12 12:57:05,582 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 12:57:11,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2024-08-12 12:57:13,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.597e+01 2.875e+01 3.428e+01 8.355e+01, threshold=5.751e+01, percent-clipped=1.0 2024-08-12 12:57:20,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2024-08-12 12:57:21,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1645210.0, ans=0.125 2024-08-12 12:57:27,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1645310.0, ans=0.0 2024-08-12 12:57:40,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1645410.0, ans=0.09899494936611666 2024-08-12 12:57:45,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1645410.0, ans=0.0 2024-08-12 12:58:01,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1645510.0, ans=0.1 2024-08-12 12:58:12,299 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5150, loss[loss=0.1278, beats_loss=0.008851, ecapa_loss=0.0001492, whisper_loss=0.1175, over 21249.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001755, whisper_loss=0.09264, over 3938290.00 frames. ], batch size: 79, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:58:12,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1645610.0, ans=0.125 2024-08-12 12:58:42,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.01 vs. limit=22.5 2024-08-12 12:58:51,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-08-12 12:58:56,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1645910.0, ans=0.0 2024-08-12 12:58:59,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1645910.0, ans=0.125 2024-08-12 12:59:17,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1646010.0, ans=0.0 2024-08-12 12:59:18,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1646010.0, ans=0.1 2024-08-12 12:59:21,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1646010.0, ans=0.0 2024-08-12 12:59:24,454 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5200, loss[loss=0.08963, beats_loss=0.01079, ecapa_loss=0.0001899, whisper_loss=0.07694, over 13619.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0111, ecapa_loss=0.0001747, whisper_loss=0.0922, over 3913382.69 frames. ], batch size: 53, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:59:24,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1646110.0, ans=0.07 2024-08-12 12:59:30,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1646110.0, ans=0.035 2024-08-12 12:59:32,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1646110.0, ans=0.2 2024-08-12 12:59:35,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1646110.0, ans=0.125 2024-08-12 12:59:39,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.626e+01 2.923e+01 3.403e+01 3.236e+02, threshold=5.847e+01, percent-clipped=1.0 2024-08-12 12:59:41,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1646210.0, ans=0.2 2024-08-12 12:59:50,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1646310.0, ans=0.0 2024-08-12 12:59:51,707 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 13:00:05,670 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 13:00:09,527 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 13:00:15,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-12 13:00:15,918 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 13:00:21,722 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 13:00:32,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5250, loss[loss=0.1191, beats_loss=0.009813, ecapa_loss=0.0001629, whisper_loss=0.1076, over 18683.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01107, ecapa_loss=0.000175, whisper_loss=0.09213, over 3898799.29 frames. ], batch size: 72, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:00:33,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2024-08-12 13:00:39,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2024-08-12 13:00:42,050 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 13:00:43,493 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 13:00:55,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1646710.0, ans=0.0 2024-08-12 13:01:23,698 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 13:01:30,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2024-08-12 13:01:33,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1647010.0, ans=0.015 2024-08-12 13:01:34,895 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 13:01:35,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=12.0 2024-08-12 13:01:36,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1647010.0, ans=15.0 2024-08-12 13:01:38,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5300, loss[loss=0.1004, beats_loss=0.009419, ecapa_loss=0.0002095, whisper_loss=0.08893, over 13411.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01102, ecapa_loss=0.000176, whisper_loss=0.09228, over 3886906.40 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:01:54,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.492e+01 2.766e+01 3.259e+01 2.039e+02, threshold=5.533e+01, percent-clipped=1.0 2024-08-12 13:01:59,380 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 13:02:03,719 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 13:02:20,651 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 13:02:29,666 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 9 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 13:02:30,948 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 13:02:34,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-08-12 13:02:43,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5350, loss[loss=0.11, beats_loss=0.0108, ecapa_loss=0.0001821, whisper_loss=0.09739, over 19728.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01106, ecapa_loss=0.0001751, whisper_loss=0.09153, over 3894610.66 frames. ], batch size: 78, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:02:53,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1647610.0, ans=0.125 2024-08-12 13:02:57,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1647710.0, ans=0.0 2024-08-12 13:02:58,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1647710.0, ans=0.1 2024-08-12 13:03:11,141 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 13:03:11,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1647810.0, ans=0.0 2024-08-12 13:03:16,540 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 13:03:16,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1647810.0, ans=0.125 2024-08-12 13:03:17,692 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 13:03:29,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1647910.0, ans=0.125 2024-08-12 13:03:48,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5400, loss[loss=0.1123, beats_loss=0.01049, ecapa_loss=0.0001616, whisper_loss=0.1002, over 21542.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001743, whisper_loss=0.09158, over 3895424.20 frames. ], batch size: 84, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:04:04,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.540e+01 2.809e+01 3.411e+01 5.713e+01, threshold=5.618e+01, percent-clipped=1.0 2024-08-12 13:04:05,868 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 13:04:14,735 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 13:04:19,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1648310.0, ans=0.0 2024-08-12 13:04:27,269 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 13:04:27,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1648410.0, ans=0.2 2024-08-12 13:04:36,047 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 13:04:38,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-08-12 13:04:43,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1648510.0, ans=0.125 2024-08-12 13:04:47,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1648510.0, ans=0.125 2024-08-12 13:04:54,140 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5450, loss[loss=0.116, beats_loss=0.01135, ecapa_loss=0.0001694, whisper_loss=0.103, over 21987.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001751, whisper_loss=0.09224, over 3893750.64 frames. ], batch size: 86, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:05:08,835 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 13:05:12,739 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-12 13:05:29,684 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 13:05:35,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1648910.0, ans=0.5 2024-08-12 13:05:38,858 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 13:05:54,806 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 13:05:57,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1649010.0, ans=0.1 2024-08-12 13:05:59,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5500, loss[loss=0.1064, beats_loss=0.0117, ecapa_loss=0.0001785, whisper_loss=0.09287, over 15293.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01099, ecapa_loss=0.000176, whisper_loss=0.09211, over 3888945.37 frames. ], batch size: 63, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:06:14,361 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 13:06:15,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.546e+01 2.808e+01 3.382e+01 4.653e+01, threshold=5.615e+01, percent-clipped=0.0 2024-08-12 13:06:20,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2024-08-12 13:06:32,176 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 13:06:33,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1649310.0, ans=0.0 2024-08-12 13:06:37,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1649410.0, ans=0.1 2024-08-12 13:06:44,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1649410.0, ans=0.0 2024-08-12 13:06:55,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1649510.0, ans=0.125 2024-08-12 13:07:05,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5550, loss[loss=0.1012, beats_loss=0.01257, ecapa_loss=0.0001716, whisper_loss=0.08694, over 22906.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01101, ecapa_loss=0.0001757, whisper_loss=0.09276, over 3914920.12 frames. ], batch size: 95, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:07:15,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1649610.0, ans=0.125 2024-08-12 13:07:27,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1649710.0, ans=0.0 2024-08-12 13:07:36,697 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-12 13:07:41,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1649810.0, ans=0.1 2024-08-12 13:07:43,238 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 13:07:43,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1649810.0, ans=0.0 2024-08-12 13:08:12,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1650010.0, ans=0.1 2024-08-12 13:08:17,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1650010.0, ans=0.0 2024-08-12 13:08:21,127 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 13:08:26,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5600, loss[loss=0.1023, beats_loss=0.01176, ecapa_loss=0.0001335, whisper_loss=0.08922, over 22031.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01101, ecapa_loss=0.0001752, whisper_loss=0.0923, over 3920553.41 frames. ], batch size: 86, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:08:31,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2024-08-12 13:08:51,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.532e+01 2.834e+01 3.138e+01 6.030e+01, threshold=5.668e+01, percent-clipped=1.0 2024-08-12 13:09:18,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1650310.0, ans=0.1 2024-08-12 13:09:19,569 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 13:09:40,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1650510.0, ans=0.07 2024-08-12 13:09:40,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1650510.0, ans=0.125 2024-08-12 13:09:41,266 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 13:09:55,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5650, loss[loss=0.1245, beats_loss=0.008544, ecapa_loss=0.0002072, whisper_loss=0.1139, over 20098.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01105, ecapa_loss=0.0001747, whisper_loss=0.09197, over 3951591.72 frames. ], batch size: 77, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:10:11,101 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 13:10:45,098 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 13:10:53,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2024-08-12 13:11:02,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1651010.0, ans=0.2 2024-08-12 13:11:13,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5700, loss[loss=0.1043, beats_loss=0.01259, ecapa_loss=0.0001697, whisper_loss=0.08997, over 22579.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.0001758, whisper_loss=0.09212, over 3909356.45 frames. ], batch size: 92, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:11:31,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.530e+01 2.812e+01 3.253e+01 9.696e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 13:11:48,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1651310.0, ans=0.2 2024-08-12 13:11:51,923 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 13:12:24,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1651510.0, ans=0.0 2024-08-12 13:12:28,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=12.0 2024-08-12 13:12:30,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5750, loss[loss=0.1017, beats_loss=0.009488, ecapa_loss=0.0001713, whisper_loss=0.09052, over 20158.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01116, ecapa_loss=0.0001748, whisper_loss=0.09112, over 3887413.38 frames. ], batch size: 80, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:12:37,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2024-08-12 13:12:59,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1651810.0, ans=0.125 2024-08-12 13:13:07,485 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-12 13:13:12,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2024-08-12 13:13:41,682 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 13:13:45,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5800, loss[loss=0.1016, beats_loss=0.009612, ecapa_loss=0.0001786, whisper_loss=0.09016, over 13948.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01115, ecapa_loss=0.0001757, whisper_loss=0.09078, over 3836064.80 frames. ], batch size: 55, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:13:49,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1652110.0, ans=0.04949747468305833 2024-08-12 13:13:52,406 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 13:13:52,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1652110.0, ans=0.0 2024-08-12 13:13:53,959 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 13:14:04,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.480e+01 2.682e+01 3.175e+01 5.563e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-12 13:14:37,390 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 13:14:43,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1652410.0, ans=0.125 2024-08-12 13:14:48,519 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-12 13:14:52,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1652510.0, ans=0.1 2024-08-12 13:14:52,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1652510.0, ans=0.125 2024-08-12 13:14:58,237 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-12 13:15:05,755 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5850, loss[loss=0.1035, beats_loss=0.01191, ecapa_loss=0.0001652, whisper_loss=0.08993, over 22477.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01117, ecapa_loss=0.0001748, whisper_loss=0.09053, over 3851815.68 frames. ], batch size: 90, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:15:09,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=12.0 2024-08-12 13:15:11,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1652610.0, ans=0.125 2024-08-12 13:15:11,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=12.0 2024-08-12 13:15:26,233 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 13:15:38,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1652810.0, ans=0.125 2024-08-12 13:15:39,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1652810.0, ans=0.0 2024-08-12 13:15:48,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-12 13:15:53,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2024-08-12 13:16:03,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1652910.0, ans=0.125 2024-08-12 13:16:11,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1653010.0, ans=0.2 2024-08-12 13:16:20,274 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-12 13:16:25,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5900, loss[loss=0.07932, beats_loss=0.01197, ecapa_loss=0.0001923, whisper_loss=0.06543, over 13305.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0111, ecapa_loss=0.0001765, whisper_loss=0.0904, over 3851952.15 frames. ], batch size: 55, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:16:34,643 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 28 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-12 13:16:43,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.413e+01 2.728e+01 2.999e+01 4.140e+01, threshold=5.456e+01, percent-clipped=0.0 2024-08-12 13:16:50,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1653210.0, ans=0.0 2024-08-12 13:16:53,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1653210.0, ans=0.0 2024-08-12 13:16:56,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1653310.0, ans=0.5 2024-08-12 13:17:10,510 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 13:17:19,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1653410.0, ans=0.1 2024-08-12 13:17:43,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 5950, loss[loss=0.1062, beats_loss=0.01135, ecapa_loss=0.0001813, whisper_loss=0.09299, over 20296.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0112, ecapa_loss=0.0001749, whisper_loss=0.09023, over 3872024.92 frames. ], batch size: 83, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:17:44,814 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 13:17:58,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1653710.0, ans=0.125 2024-08-12 13:18:33,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1653910.0, ans=0.0 2024-08-12 13:18:55,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1654010.0, ans=0.2 2024-08-12 13:19:03,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6000, loss[loss=0.1162, beats_loss=0.008208, ecapa_loss=0.00024, whisper_loss=0.1056, over 17328.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01118, ecapa_loss=0.0001748, whisper_loss=0.09122, over 3896089.43 frames. ], batch size: 69, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:19:03,595 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 13:19:40,038 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005888, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 13:19:58,307 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on SV_voxceleb1: loss=0.004729, beats_loss=0, ecapa_loss=0.0004729, whisper_loss=0, over 939242.00 frames. 2024-08-12 13:21:43,868 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on AT_audioset: loss=0.02432, beats_loss=0.02432, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 13:21:43,872 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 13:21:51,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-12 13:21:58,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1654210.0, ans=0.07 2024-08-12 13:22:03,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.605e+01 2.854e+01 3.270e+01 6.510e+01, threshold=5.707e+01, percent-clipped=1.0 2024-08-12 13:22:07,628 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 13:22:11,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1654210.0, ans=0.0 2024-08-12 13:22:15,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1654310.0, ans=0.125 2024-08-12 13:22:25,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1654310.0, ans=0.125 2024-08-12 13:22:25,881 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 13:22:33,067 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 13:23:02,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6050, loss[loss=0.1355, beats_loss=0.007784, ecapa_loss=0.0001627, whisper_loss=0.1261, over 21666.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01107, ecapa_loss=0.000174, whisper_loss=0.09201, over 3897991.33 frames. ], batch size: 81, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:23:44,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1654810.0, ans=0.2 2024-08-12 13:23:53,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1654910.0, ans=0.1 2024-08-12 13:23:54,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1654910.0, ans=0.125 2024-08-12 13:24:10,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1655010.0, ans=0.0 2024-08-12 13:24:23,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6100, loss[loss=0.09012, beats_loss=0.01043, ecapa_loss=0.0001923, whisper_loss=0.07777, over 20077.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01106, ecapa_loss=0.0001753, whisper_loss=0.09172, over 3872241.94 frames. ], batch size: 85, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:24:37,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1655110.0, ans=0.125 2024-08-12 13:24:42,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.433e+01 2.687e+01 2.996e+01 4.596e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-12 13:24:48,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1655210.0, ans=0.125 2024-08-12 13:25:00,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.61 vs. limit=10.0 2024-08-12 13:25:16,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-08-12 13:25:42,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6150, loss[loss=0.1074, beats_loss=0.01215, ecapa_loss=0.0001589, whisper_loss=0.09366, over 22505.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01107, ecapa_loss=0.0001753, whisper_loss=0.09191, over 3868079.76 frames. ], batch size: 88, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:26:49,285 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 13:26:55,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2024-08-12 13:26:59,212 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 13:27:01,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6200, loss[loss=0.1172, beats_loss=0.009446, ecapa_loss=0.0001714, whisper_loss=0.106, over 15805.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01115, ecapa_loss=0.000175, whisper_loss=0.09184, over 3888011.64 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:27:07,157 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 13:27:21,059 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.566e+01 2.980e+01 3.459e+01 1.302e+02, threshold=5.960e+01, percent-clipped=3.0 2024-08-12 13:27:30,251 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 13:27:41,501 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 13:27:46,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1656310.0, ans=0.025 2024-08-12 13:27:53,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1656410.0, ans=0.125 2024-08-12 13:28:08,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=12.0 2024-08-12 13:28:10,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1656510.0, ans=0.2 2024-08-12 13:28:14,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1656510.0, ans=0.125 2024-08-12 13:28:20,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6250, loss[loss=0.1021, beats_loss=0.00953, ecapa_loss=0.000204, whisper_loss=0.09054, over 20536.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01113, ecapa_loss=0.0001756, whisper_loss=0.09195, over 3902148.00 frames. ], batch size: 84, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:28:24,316 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 13:28:29,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1656610.0, ans=0.125 2024-08-12 13:28:30,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1656610.0, ans=0.125 2024-08-12 13:28:31,255 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 13:28:36,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1656710.0, ans=0.2 2024-08-12 13:28:39,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1656710.0, ans=0.125 2024-08-12 13:29:30,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1657010.0, ans=0.1 2024-08-12 13:29:38,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6300, loss[loss=0.09074, beats_loss=0.01417, ecapa_loss=0.000159, whisper_loss=0.07498, over 21641.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01117, ecapa_loss=0.0001759, whisper_loss=0.09161, over 3893772.56 frames. ], batch size: 90, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:29:39,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-12 13:29:43,781 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 13:29:48,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-08-12 13:29:51,979 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 13:29:54,164 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 13:29:56,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.533e+01 2.764e+01 3.173e+01 6.844e+01, threshold=5.528e+01, percent-clipped=1.0 2024-08-12 13:30:02,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1657210.0, ans=0.2 2024-08-12 13:30:06,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1657210.0, ans=0.125 2024-08-12 13:30:06,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2024-08-12 13:30:19,459 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 13:30:23,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1657410.0, ans=0.125 2024-08-12 13:30:32,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1657410.0, ans=0.0 2024-08-12 13:30:36,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1657410.0, ans=0.0 2024-08-12 13:30:36,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1657410.0, ans=0.0 2024-08-12 13:30:49,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.93 vs. limit=10.0 2024-08-12 13:30:54,843 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6350, loss[loss=0.1272, beats_loss=0.01026, ecapa_loss=0.0001727, whisper_loss=0.1152, over 24134.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01122, ecapa_loss=0.0001753, whisper_loss=0.09119, over 3897671.02 frames. ], batch size: 91, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:30:59,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1657610.0, ans=0.1 2024-08-12 13:31:03,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1657610.0, ans=0.0 2024-08-12 13:31:12,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1657710.0, ans=0.125 2024-08-12 13:31:13,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.87 vs. limit=5.0 2024-08-12 13:31:22,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1657710.0, ans=0.125 2024-08-12 13:31:24,913 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 13:31:25,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-08-12 13:31:34,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1657810.0, ans=0.1 2024-08-12 13:31:41,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1657910.0, ans=0.125 2024-08-12 13:31:43,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1657910.0, ans=0.125 2024-08-12 13:32:11,063 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6400, loss[loss=0.1232, beats_loss=0.01059, ecapa_loss=0.0001853, whisper_loss=0.1107, over 16886.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01114, ecapa_loss=0.0001761, whisper_loss=0.0916, over 3904342.90 frames. ], batch size: 66, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:32:23,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-12 13:32:26,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1658210.0, ans=0.1 2024-08-12 13:32:27,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=15.0 2024-08-12 13:32:29,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.462e+01 2.715e+01 3.060e+01 4.478e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 13:32:38,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1658210.0, ans=0.125 2024-08-12 13:32:45,663 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 13:32:59,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.82 vs. limit=15.0 2024-08-12 13:33:20,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-08-12 13:33:24,253 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 13:33:26,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6450, loss[loss=0.1026, beats_loss=0.01104, ecapa_loss=0.0001897, whisper_loss=0.0897, over 17448.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01108, ecapa_loss=0.0001771, whisper_loss=0.09254, over 3942651.81 frames. ], batch size: 72, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:33:38,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1658610.0, ans=0.2 2024-08-12 13:33:49,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1658710.0, ans=0.125 2024-08-12 13:33:55,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1658810.0, ans=0.0 2024-08-12 13:34:08,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1658810.0, ans=0.035 2024-08-12 13:34:08,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-08-12 13:34:11,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1658910.0, ans=0.125 2024-08-12 13:34:26,366 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 13:34:37,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1659010.0, ans=0.125 2024-08-12 13:34:41,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6500, loss[loss=0.07273, beats_loss=0.01035, ecapa_loss=0.0002122, whisper_loss=0.06026, over 13966.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01104, ecapa_loss=0.0001762, whisper_loss=0.09277, over 3928813.70 frames. ], batch size: 57, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:34:58,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.642e+01 2.943e+01 3.228e+01 1.281e+02, threshold=5.885e+01, percent-clipped=1.0 2024-08-12 13:35:00,390 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 13:35:05,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-12 13:35:10,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1659310.0, ans=0.0 2024-08-12 13:35:16,354 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 13:35:22,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1659310.0, ans=0.0 2024-08-12 13:35:26,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1659410.0, ans=0.1 2024-08-12 13:35:53,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1659510.0, ans=0.125 2024-08-12 13:35:55,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6550, loss[loss=0.09227, beats_loss=0.0127, ecapa_loss=0.0001642, whisper_loss=0.07794, over 18190.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.0001758, whisper_loss=0.09252, over 3924184.50 frames. ], batch size: 73, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:35:57,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1659610.0, ans=0.035 2024-08-12 13:36:07,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=8.0 2024-08-12 13:36:09,505 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 13:36:13,911 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 13:36:31,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2024-08-12 13:36:33,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1659810.0, ans=0.1 2024-08-12 13:36:34,749 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 13:36:38,124 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 13:36:41,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1659910.0, ans=0.125 2024-08-12 13:36:47,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1659910.0, ans=0.1 2024-08-12 13:36:53,549 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 13:36:55,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1660010.0, ans=0.0 2024-08-12 13:37:10,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6600, loss[loss=0.1144, beats_loss=0.01109, ecapa_loss=0.0001599, whisper_loss=0.1017, over 23340.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01105, ecapa_loss=0.0001761, whisper_loss=0.09239, over 3954594.01 frames. ], batch size: 91, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:37:14,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1660110.0, ans=0.125 2024-08-12 13:37:26,624 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 13:37:28,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.664e+01 3.043e+01 3.449e+01 7.276e+01, threshold=6.087e+01, percent-clipped=1.0 2024-08-12 13:37:34,369 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-12 13:37:39,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1660310.0, ans=0.125 2024-08-12 13:37:42,806 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 13:38:00,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1660410.0, ans=0.125 2024-08-12 13:38:23,780 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6650, loss[loss=0.1126, beats_loss=0.008782, ecapa_loss=0.0001766, whisper_loss=0.102, over 17367.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01104, ecapa_loss=0.0001774, whisper_loss=0.09243, over 3941721.49 frames. ], batch size: 68, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:38:26,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1660610.0, ans=0.2 2024-08-12 13:38:30,807 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 13:38:33,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=12.0 2024-08-12 13:38:41,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1660710.0, ans=0.2 2024-08-12 13:38:50,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-12 13:38:59,991 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 13:39:05,495 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 13:39:12,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1660910.0, ans=0.2 2024-08-12 13:39:15,003 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 13:39:19,938 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 13:39:35,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6700, loss[loss=0.1262, beats_loss=0.008134, ecapa_loss=0.0001967, whisper_loss=0.1161, over 18824.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01099, ecapa_loss=0.0001776, whisper_loss=0.0924, over 3917842.47 frames. ], batch size: 74, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:39:42,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1661110.0, ans=0.0 2024-08-12 13:39:47,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-12 13:39:52,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.550e+01 2.931e+01 3.277e+01 4.693e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 13:39:53,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1661210.0, ans=0.125 2024-08-12 13:40:13,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1661310.0, ans=0.125 2024-08-12 13:40:36,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-12 13:40:47,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6750, loss[loss=0.07121, beats_loss=0.01646, ecapa_loss=0.0001439, whisper_loss=0.05331, over 16897.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.000178, whisper_loss=0.0929, over 3919541.09 frames. ], batch size: 70, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:40:52,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1661610.0, ans=0.07 2024-08-12 13:41:09,067 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 31 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 13:41:26,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1661810.0, ans=0.2 2024-08-12 13:41:27,598 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 13:41:36,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1661910.0, ans=0.125 2024-08-12 13:41:55,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.11 vs. limit=10.0 2024-08-12 13:41:58,506 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6800, loss[loss=0.1001, beats_loss=0.009872, ecapa_loss=0.0001936, whisper_loss=0.08828, over 21616.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001775, whisper_loss=0.09251, over 3908432.59 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:42:14,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1662210.0, ans=0.125 2024-08-12 13:42:14,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.516e+01 2.723e+01 3.027e+01 3.885e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 13:42:21,985 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 14 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 13:42:23,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1662210.0, ans=0.0 2024-08-12 13:42:29,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-12 13:42:32,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-12 13:43:08,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6850, loss[loss=0.1062, beats_loss=0.01043, ecapa_loss=0.000177, whisper_loss=0.09396, over 22023.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01096, ecapa_loss=0.000177, whisper_loss=0.09199, over 3906079.20 frames. ], batch size: 90, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:43:44,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1662810.0, ans=0.125 2024-08-12 13:43:48,274 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 13:44:03,692 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 13:44:05,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1663010.0, ans=0.2 2024-08-12 13:44:06,648 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 13:44:07,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1663010.0, ans=0.0 2024-08-12 13:44:19,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6900, loss[loss=0.0925, beats_loss=0.01108, ecapa_loss=0.000148, whisper_loss=0.07994, over 19666.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001779, whisper_loss=0.09166, over 3867648.02 frames. ], batch size: 77, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:44:26,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1663110.0, ans=0.125 2024-08-12 13:44:35,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.437e+01 2.665e+01 2.983e+01 5.492e+01, threshold=5.330e+01, percent-clipped=1.0 2024-08-12 13:45:01,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1663410.0, ans=0.0 2024-08-12 13:45:01,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1663410.0, ans=0.125 2024-08-12 13:45:15,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-08-12 13:45:18,539 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 13:45:20,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-12 13:45:21,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1663510.0, ans=0.2 2024-08-12 13:45:25,442 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 13:45:29,924 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 6950, loss[loss=0.1047, beats_loss=0.01061, ecapa_loss=0.0001588, whisper_loss=0.0925, over 21853.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01104, ecapa_loss=0.0001767, whisper_loss=0.09143, over 3859729.15 frames. ], batch size: 84, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:45:40,213 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 13:45:41,446 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 13:45:49,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1663710.0, ans=0.125 2024-08-12 13:45:56,727 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-12 13:46:00,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-12 13:46:15,501 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 13:46:28,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1664010.0, ans=0.0 2024-08-12 13:46:41,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1664010.0, ans=15.0 2024-08-12 13:46:42,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7000, loss[loss=0.08032, beats_loss=0.01035, ecapa_loss=0.0001829, whisper_loss=0.06813, over 15907.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01101, ecapa_loss=0.0001782, whisper_loss=0.09122, over 3835337.87 frames. ], batch size: 63, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:46:47,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1664110.0, ans=0.125 2024-08-12 13:46:52,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1664110.0, ans=0.1 2024-08-12 13:46:53,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1664110.0, ans=0.0 2024-08-12 13:46:54,592 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 13:47:00,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.453e+01 2.767e+01 3.334e+01 1.862e+02, threshold=5.533e+01, percent-clipped=4.0 2024-08-12 13:47:12,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1664310.0, ans=0.125 2024-08-12 13:47:35,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1664410.0, ans=0.0 2024-08-12 13:47:45,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1664510.0, ans=0.0 2024-08-12 13:47:52,540 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 20 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-12 13:47:54,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1664610.0, ans=0.125 2024-08-12 13:47:55,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7050, loss[loss=0.1072, beats_loss=0.01119, ecapa_loss=0.0002361, whisper_loss=0.09361, over 17768.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01103, ecapa_loss=0.0001788, whisper_loss=0.09173, over 3885402.15 frames. ], batch size: 75, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:48:17,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1664710.0, ans=0.035 2024-08-12 13:48:17,805 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:48:21,939 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 13:48:28,834 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 13:48:30,309 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 13:48:50,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1664910.0, ans=0.0 2024-08-12 13:48:50,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1664910.0, ans=0.125 2024-08-12 13:49:01,775 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 13:49:09,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7100, loss[loss=0.07921, beats_loss=0.01303, ecapa_loss=0.0001406, whisper_loss=0.06477, over 15937.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01109, ecapa_loss=0.0001769, whisper_loss=0.09124, over 3856136.66 frames. ], batch size: 63, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:49:17,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1665110.0, ans=0.125 2024-08-12 13:49:25,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.512e+01 2.817e+01 3.133e+01 5.318e+01, threshold=5.634e+01, percent-clipped=0.0 2024-08-12 13:49:51,122 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 13:50:09,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1665510.0, ans=0.125 2024-08-12 13:50:13,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1665510.0, ans=0.0 2024-08-12 13:50:21,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1665610.0, ans=0.125 2024-08-12 13:50:22,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7150, loss[loss=0.104, beats_loss=0.01257, ecapa_loss=0.0001678, whisper_loss=0.08977, over 21123.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01112, ecapa_loss=0.0001752, whisper_loss=0.09143, over 3868186.91 frames. ], batch size: 87, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:50:31,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=15.0 2024-08-12 13:50:37,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1665710.0, ans=0.125 2024-08-12 13:50:37,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1665710.0, ans=0.0 2024-08-12 13:50:46,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1665710.0, ans=0.125 2024-08-12 13:51:00,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2024-08-12 13:51:10,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1665910.0, ans=0.1 2024-08-12 13:51:27,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1666010.0, ans=0.0 2024-08-12 13:51:36,166 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7200, loss[loss=0.1022, beats_loss=0.01111, ecapa_loss=0.0001853, whisper_loss=0.08924, over 15094.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0111, ecapa_loss=0.0001751, whisper_loss=0.09101, over 3850081.06 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:51:53,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.582e+01 2.995e+01 3.267e+01 4.717e+01, threshold=5.989e+01, percent-clipped=0.0 2024-08-12 13:51:55,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1666210.0, ans=0.1 2024-08-12 13:52:12,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1666310.0, ans=0.125 2024-08-12 13:52:20,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-12 13:52:22,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1666410.0, ans=0.125 2024-08-12 13:52:48,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7250, loss[loss=0.09536, beats_loss=0.01232, ecapa_loss=0.0001441, whisper_loss=0.08159, over 22565.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.000175, whisper_loss=0.09162, over 3842298.80 frames. ], batch size: 89, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:53:17,472 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 13:53:21,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1666810.0, ans=0.125 2024-08-12 13:53:29,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=12.0 2024-08-12 13:53:32,282 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 13:53:34,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1666910.0, ans=0.2 2024-08-12 13:53:37,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1666910.0, ans=0.125 2024-08-12 13:53:53,183 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 13:53:55,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1667010.0, ans=0.125 2024-08-12 13:53:58,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1667010.0, ans=0.125 2024-08-12 13:53:59,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1667010.0, ans=0.1 2024-08-12 13:54:01,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1667010.0, ans=0.2 2024-08-12 13:54:05,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7300, loss[loss=0.114, beats_loss=0.01031, ecapa_loss=0.0001896, whisper_loss=0.1018, over 16803.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.000176, whisper_loss=0.09168, over 3842319.82 frames. ], batch size: 69, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:54:19,213 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 13:54:24,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.453e+01 2.736e+01 3.058e+01 4.580e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-12 13:54:32,273 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-12 13:54:33,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1667210.0, ans=0.025 2024-08-12 13:54:38,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1667310.0, ans=0.2 2024-08-12 13:54:43,305 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 13:55:01,321 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-12 13:55:01,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1667410.0, ans=0.0 2024-08-12 13:55:15,847 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 13:55:23,496 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7350, loss[loss=0.08831, beats_loss=0.01318, ecapa_loss=0.000199, whisper_loss=0.07314, over 20493.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01112, ecapa_loss=0.0001764, whisper_loss=0.09177, over 3834239.69 frames. ], batch size: 86, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:55:28,566 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-12 13:55:41,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1667710.0, ans=0.125 2024-08-12 13:55:44,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1667710.0, ans=0.025 2024-08-12 13:55:48,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-12 13:55:55,581 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 13:55:58,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1667810.0, ans=0.2 2024-08-12 13:56:01,747 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-12 13:56:06,918 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 13:56:07,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1667810.0, ans=0.0 2024-08-12 13:56:27,294 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 13:56:41,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7400, loss[loss=0.08748, beats_loss=0.01112, ecapa_loss=0.000167, whisper_loss=0.07469, over 20257.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01118, ecapa_loss=0.0001761, whisper_loss=0.09118, over 3852936.97 frames. ], batch size: 84, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:56:52,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-12 13:56:58,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.596e+01 2.915e+01 3.233e+01 4.650e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-12 13:56:59,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-12 13:57:05,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1668210.0, ans=0.125 2024-08-12 13:57:05,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=38.86 vs. limit=22.5 2024-08-12 13:57:19,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-12 13:57:20,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 13:57:31,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1668410.0, ans=0.07 2024-08-12 13:57:52,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1668510.0, ans=0.125 2024-08-12 13:57:55,765 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7450, loss[loss=0.1192, beats_loss=0.01043, ecapa_loss=0.0001706, whisper_loss=0.1071, over 22384.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01115, ecapa_loss=0.0001758, whisper_loss=0.09172, over 3879723.15 frames. ], batch size: 89, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:57:58,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-12 13:57:59,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1668610.0, ans=0.1 2024-08-12 13:58:03,657 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 13:58:10,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=15.0 2024-08-12 13:59:08,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1669010.0, ans=0.125 2024-08-12 13:59:12,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7500, loss[loss=0.1098, beats_loss=0.01524, ecapa_loss=0.0001294, whisper_loss=0.09325, over 23724.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01121, ecapa_loss=0.0001747, whisper_loss=0.09116, over 3896094.89 frames. ], batch size: 93, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:59:15,879 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 13:59:19,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1669110.0, ans=0.2 2024-08-12 13:59:25,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1669110.0, ans=0.0 2024-08-12 13:59:28,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-12 13:59:30,384 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.570e+01 2.878e+01 3.293e+01 5.497e+01, threshold=5.755e+01, percent-clipped=0.0 2024-08-12 13:59:38,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1669210.0, ans=0.125 2024-08-12 13:59:39,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1669210.0, ans=0.125 2024-08-12 13:59:42,285 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 13:59:51,285 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 13:59:56,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1669410.0, ans=0.0 2024-08-12 14:00:13,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1669510.0, ans=0.125 2024-08-12 14:00:16,691 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.115e+00 2024-08-12 14:00:24,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1669510.0, ans=0.125 2024-08-12 14:00:26,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7550, loss[loss=0.1062, beats_loss=0.01267, ecapa_loss=0.0001576, whisper_loss=0.09198, over 23339.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01115, ecapa_loss=0.0001766, whisper_loss=0.09091, over 3879525.25 frames. ], batch size: 92, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:00:49,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1669710.0, ans=0.1 2024-08-12 14:00:52,999 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-12 14:01:09,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1669810.0, ans=0.1 2024-08-12 14:01:24,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1669910.0, ans=0.1 2024-08-12 14:01:35,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1670010.0, ans=0.07 2024-08-12 14:01:40,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7600, loss[loss=0.1278, beats_loss=0.009401, ecapa_loss=0.0001716, whisper_loss=0.1167, over 23420.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01112, ecapa_loss=0.0001764, whisper_loss=0.09093, over 3860937.99 frames. ], batch size: 88, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:01:46,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1670110.0, ans=0.0 2024-08-12 14:01:59,022 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.500e+01 2.707e+01 3.102e+01 5.200e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 14:02:04,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1670210.0, ans=0.125 2024-08-12 14:02:04,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-12 14:02:05,086 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 14:02:05,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1670210.0, ans=0.1 2024-08-12 14:02:40,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1670410.0, ans=0.5 2024-08-12 14:02:43,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1670510.0, ans=0.125 2024-08-12 14:02:43,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1670510.0, ans=0.125 2024-08-12 14:02:45,829 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 14:02:50,411 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-12 14:02:54,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1670510.0, ans=0.125 2024-08-12 14:02:57,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7650, loss[loss=0.09639, beats_loss=0.01234, ecapa_loss=0.0001665, whisper_loss=0.08239, over 22894.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0111, ecapa_loss=0.0001757, whisper_loss=0.09105, over 3861722.60 frames. ], batch size: 91, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:03:04,032 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 14:03:14,688 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.845e-02 2024-08-12 14:03:23,371 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 14:03:31,291 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 14:03:39,892 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 14:03:44,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1670910.0, ans=0.125 2024-08-12 14:03:47,864 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 14:03:52,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1670910.0, ans=0.125 2024-08-12 14:04:07,608 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.576e+00 2024-08-12 14:04:10,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1671010.0, ans=0.0 2024-08-12 14:04:13,358 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 14:04:14,454 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7700, loss[loss=0.08683, beats_loss=0.01282, ecapa_loss=0.0001439, whisper_loss=0.07257, over 16509.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001769, whisper_loss=0.09147, over 3887314.61 frames. ], batch size: 65, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:04:16,353 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-12 14:04:32,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1671210.0, ans=22.5 2024-08-12 14:04:33,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.546e+01 2.810e+01 3.287e+01 1.654e+02, threshold=5.620e+01, percent-clipped=2.0 2024-08-12 14:05:07,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-08-12 14:05:11,554 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 14:05:18,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1671510.0, ans=0.125 2024-08-12 14:05:36,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7750, loss[loss=0.1227, beats_loss=0.01127, ecapa_loss=0.000179, whisper_loss=0.1096, over 20265.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01098, ecapa_loss=0.0001771, whisper_loss=0.09077, over 3869563.20 frames. ], batch size: 80, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:06:02,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1671710.0, ans=0.07 2024-08-12 14:06:06,960 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 14:06:18,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1671810.0, ans=0.0 2024-08-12 14:06:21,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1671810.0, ans=0.0 2024-08-12 14:06:23,093 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-12 14:06:39,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1671910.0, ans=0.125 2024-08-12 14:06:48,415 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-12 14:06:56,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1672010.0, ans=0.125 2024-08-12 14:06:57,305 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 14:06:58,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1672010.0, ans=0.125 2024-08-12 14:07:02,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7800, loss[loss=0.1372, beats_loss=0.007917, ecapa_loss=0.0002025, whisper_loss=0.1273, over 21916.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01098, ecapa_loss=0.0001775, whisper_loss=0.09131, over 3877489.47 frames. ], batch size: 87, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:07:22,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1672210.0, ans=0.125 2024-08-12 14:07:23,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.562e+01 2.777e+01 3.107e+01 5.363e+01, threshold=5.555e+01, percent-clipped=0.0 2024-08-12 14:07:27,296 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 14:07:31,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1672210.0, ans=0.1 2024-08-12 14:07:37,867 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 14:07:41,865 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 14:07:45,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1672310.0, ans=0.2 2024-08-12 14:08:00,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1672410.0, ans=0.1 2024-08-12 14:08:11,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1672510.0, ans=0.0 2024-08-12 14:08:13,122 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 32 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 14:08:20,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1672510.0, ans=0.125 2024-08-12 14:08:28,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7850, loss[loss=0.1075, beats_loss=0.007819, ecapa_loss=0.0002388, whisper_loss=0.09729, over 13570.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01109, ecapa_loss=0.0001769, whisper_loss=0.09129, over 3877079.38 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:08:32,257 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 14:09:22,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1672910.0, ans=0.0 2024-08-12 14:09:28,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-08-12 14:09:35,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1672910.0, ans=0.1 2024-08-12 14:09:59,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7900, loss[loss=0.1157, beats_loss=0.01077, ecapa_loss=0.0001666, whisper_loss=0.1032, over 23572.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01115, ecapa_loss=0.0001757, whisper_loss=0.09167, over 3887604.39 frames. ], batch size: 90, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:10:10,202 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 14:10:17,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.710e+01 2.918e+01 3.314e+01 4.550e+01, threshold=5.837e+01, percent-clipped=0.0 2024-08-12 14:10:20,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2024-08-12 14:10:26,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1673210.0, ans=0.125 2024-08-12 14:10:27,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1673210.0, ans=0.1 2024-08-12 14:10:29,491 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 14:10:31,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1673310.0, ans=0.125 2024-08-12 14:10:34,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-12 14:10:44,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1673310.0, ans=0.125 2024-08-12 14:11:13,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1673510.0, ans=0.0 2024-08-12 14:11:18,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 7950, loss[loss=0.114, beats_loss=0.009658, ecapa_loss=0.0001767, whisper_loss=0.1026, over 23049.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01116, ecapa_loss=0.0001749, whisper_loss=0.09125, over 3908806.22 frames. ], batch size: 91, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:11:25,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-12 14:11:46,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1673710.0, ans=0.125 2024-08-12 14:12:35,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1674010.0, ans=0.05 2024-08-12 14:12:47,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8000, loss[loss=0.1253, beats_loss=0.009117, ecapa_loss=0.00015, whisper_loss=0.1147, over 22920.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01105, ecapa_loss=0.0001754, whisper_loss=0.09187, over 3896194.03 frames. ], batch size: 88, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:12:53,959 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 14:12:59,101 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 14:13:07,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1674210.0, ans=0.0 2024-08-12 14:13:07,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.597e+01 2.930e+01 3.466e+01 8.592e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 14:13:35,920 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 14:13:37,673 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 14:13:44,805 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 14:13:47,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=22.5 2024-08-12 14:13:59,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1674510.0, ans=0.1 2024-08-12 14:14:02,388 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 35 from Vox, 37 fro AS 2024-08-12 14:14:12,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1674510.0, ans=0.2 2024-08-12 14:14:16,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8050, loss[loss=0.1066, beats_loss=0.01107, ecapa_loss=0.0001803, whisper_loss=0.09377, over 22307.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01094, ecapa_loss=0.0001757, whisper_loss=0.09217, over 3897796.59 frames. ], batch size: 91, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:14:29,288 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 14:14:45,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1674710.0, ans=0.0 2024-08-12 14:15:19,380 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 32 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-12 14:15:26,075 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-12 14:15:26,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1674910.0, ans=0.0 2024-08-12 14:15:44,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1675010.0, ans=0.2 2024-08-12 14:15:46,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1675010.0, ans=0.125 2024-08-12 14:15:51,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8100, loss[loss=0.06823, beats_loss=0.01234, ecapa_loss=0.0001489, whisper_loss=0.0544, over 15036.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01099, ecapa_loss=0.000175, whisper_loss=0.09166, over 3872171.84 frames. ], batch size: 60, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:16:03,057 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 14:16:12,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.342e+01 2.574e+01 2.867e+01 4.166e+01, threshold=5.148e+01, percent-clipped=0.0 2024-08-12 14:16:27,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=22.5 2024-08-12 14:16:31,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2024-08-12 14:16:49,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1675410.0, ans=0.1 2024-08-12 14:16:59,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.16 vs. limit=12.0 2024-08-12 14:17:01,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1675510.0, ans=0.125 2024-08-12 14:17:18,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8150, loss[loss=0.1172, beats_loss=0.009478, ecapa_loss=0.0001709, whisper_loss=0.106, over 21912.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001755, whisper_loss=0.09168, over 3875368.72 frames. ], batch size: 88, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:17:25,518 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-12 14:17:49,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1675710.0, ans=0.2 2024-08-12 14:17:55,403 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 14:18:11,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1675810.0, ans=0.0 2024-08-12 14:18:19,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-12 14:18:24,529 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 14:18:50,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8200, loss[loss=0.08545, beats_loss=0.01272, ecapa_loss=0.0001908, whisper_loss=0.07082, over 21668.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.0001753, whisper_loss=0.09233, over 3904051.40 frames. ], batch size: 92, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:19:04,498 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 14:19:12,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.595e+01 2.929e+01 3.219e+01 5.675e+01, threshold=5.858e+01, percent-clipped=2.0 2024-08-12 14:19:13,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1676210.0, ans=0.1 2024-08-12 14:19:20,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.41 vs. limit=15.0 2024-08-12 14:19:29,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1676310.0, ans=0.2 2024-08-12 14:19:29,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1676310.0, ans=0.04949747468305833 2024-08-12 14:20:00,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1676510.0, ans=0.0 2024-08-12 14:20:07,436 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 14:20:09,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1676510.0, ans=0.0 2024-08-12 14:20:15,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8250, loss[loss=0.1059, beats_loss=0.008708, ecapa_loss=0.0001994, whisper_loss=0.09521, over 21464.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001774, whisper_loss=0.09204, over 3932074.66 frames. ], batch size: 87, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:20:16,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1676610.0, ans=0.0 2024-08-12 14:20:44,217 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 14:21:05,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1676810.0, ans=0.125 2024-08-12 14:21:19,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1676910.0, ans=0.0 2024-08-12 14:21:27,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1677010.0, ans=0.125 2024-08-12 14:21:43,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1677010.0, ans=0.0 2024-08-12 14:21:46,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8300, loss[loss=0.07747, beats_loss=0.009507, ecapa_loss=0.0001746, whisper_loss=0.06622, over 19112.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.011, ecapa_loss=0.0001763, whisper_loss=0.09137, over 3929874.27 frames. ], batch size: 78, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:21:47,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1677110.0, ans=0.0 2024-08-12 14:21:58,467 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 14:22:06,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.461e+01 2.729e+01 3.210e+01 2.355e+02, threshold=5.459e+01, percent-clipped=3.0 2024-08-12 14:22:19,167 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 14:22:31,245 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 14:22:52,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1677410.0, ans=0.05 2024-08-12 14:23:03,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1677510.0, ans=0.0 2024-08-12 14:23:12,306 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8350, loss[loss=0.1133, beats_loss=0.01046, ecapa_loss=0.000161, whisper_loss=0.1013, over 21183.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01098, ecapa_loss=0.0001766, whisper_loss=0.09173, over 3926274.16 frames. ], batch size: 82, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:23:23,501 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 14:23:24,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2024-08-12 14:23:27,642 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 14:23:44,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=12.0 2024-08-12 14:24:02,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-12 14:24:14,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1677910.0, ans=0.0 2024-08-12 14:24:19,038 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 14:24:37,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1678110.0, ans=0.125 2024-08-12 14:24:37,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1678110.0, ans=0.125 2024-08-12 14:24:38,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8400, loss[loss=0.07245, beats_loss=0.01196, ecapa_loss=0.0001734, whisper_loss=0.05876, over 16875.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01106, ecapa_loss=0.0001768, whisper_loss=0.09127, over 3932820.92 frames. ], batch size: 69, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:24:46,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1678110.0, ans=0.2 2024-08-12 14:24:57,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1678210.0, ans=0.125 2024-08-12 14:24:59,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-12 14:24:59,442 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.506e+01 2.766e+01 3.211e+01 4.644e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:25:02,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1678210.0, ans=0.125 2024-08-12 14:25:02,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1678210.0, ans=0.125 2024-08-12 14:25:08,899 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-12 14:25:17,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1678310.0, ans=0.0 2024-08-12 14:25:31,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2024-08-12 14:25:44,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1678410.0, ans=0.125 2024-08-12 14:25:47,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1678510.0, ans=0.0 2024-08-12 14:25:54,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.24 vs. limit=10.0 2024-08-12 14:25:55,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1678510.0, ans=0.125 2024-08-12 14:26:02,999 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8450, loss[loss=0.1202, beats_loss=0.008846, ecapa_loss=0.0002161, whisper_loss=0.1092, over 22185.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001772, whisper_loss=0.09159, over 3909599.66 frames. ], batch size: 90, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:26:18,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1678710.0, ans=0.05 2024-08-12 14:26:21,371 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 14:26:41,138 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 14:27:09,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1679010.0, ans=0.0 2024-08-12 14:27:24,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8500, loss[loss=0.09991, beats_loss=0.01344, ecapa_loss=0.0001275, whisper_loss=0.0852, over 17288.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01104, ecapa_loss=0.0001766, whisper_loss=0.0908, over 3864649.52 frames. ], batch size: 69, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:27:34,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1679110.0, ans=0.125 2024-08-12 14:27:39,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1679210.0, ans=0.1 2024-08-12 14:27:44,827 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.527e+01 2.828e+01 3.185e+01 5.995e+01, threshold=5.655e+01, percent-clipped=1.0 2024-08-12 14:27:50,683 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 14:28:03,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1679310.0, ans=0.0 2024-08-12 14:28:22,071 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-12 14:28:36,547 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 14:28:48,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1679510.0, ans=0.0 2024-08-12 14:28:54,204 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-12 14:28:55,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8550, loss[loss=0.1123, beats_loss=0.009096, ecapa_loss=0.0002049, whisper_loss=0.1011, over 22573.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001747, whisper_loss=0.09165, over 3877402.18 frames. ], batch size: 94, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:29:01,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1679610.0, ans=0.1 2024-08-12 14:29:24,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1679710.0, ans=0.2 2024-08-12 14:29:34,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.62 vs. limit=6.0 2024-08-12 14:30:03,199 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 14:30:22,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-12 14:30:32,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8600, loss[loss=0.09614, beats_loss=0.01333, ecapa_loss=0.000164, whisper_loss=0.08117, over 21419.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01095, ecapa_loss=0.0001756, whisper_loss=0.09188, over 3879303.13 frames. ], batch size: 89, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:30:42,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1680110.0, ans=0.125 2024-08-12 14:30:55,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.576e+01 2.836e+01 3.188e+01 4.951e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-12 14:31:00,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1680210.0, ans=0.2 2024-08-12 14:31:09,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1680310.0, ans=0.125 2024-08-12 14:31:23,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1680310.0, ans=0.125 2024-08-12 14:31:31,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1680410.0, ans=0.125 2024-08-12 14:31:31,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1680410.0, ans=0.0 2024-08-12 14:31:33,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1680410.0, ans=0.125 2024-08-12 14:31:34,228 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 14:31:43,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=12.0 2024-08-12 14:31:48,037 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 14:31:54,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8650, loss[loss=0.08756, beats_loss=0.009056, ecapa_loss=0.0002177, whisper_loss=0.07632, over 12664.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01094, ecapa_loss=0.0001768, whisper_loss=0.09199, over 3855218.93 frames. ], batch size: 54, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:32:00,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1680610.0, ans=0.035 2024-08-12 14:32:05,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1680610.0, ans=0.2 2024-08-12 14:32:06,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1680610.0, ans=0.2 2024-08-12 14:32:09,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1680710.0, ans=0.025 2024-08-12 14:32:16,643 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 14:32:29,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-08-12 14:32:36,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1680910.0, ans=0.125 2024-08-12 14:32:58,675 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 14:33:07,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8700, loss[loss=0.1009, beats_loss=0.009221, ecapa_loss=0.0002244, whisper_loss=0.08948, over 13614.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001767, whisper_loss=0.09156, over 3862743.95 frames. ], batch size: 59, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:33:19,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1681110.0, ans=0.125 2024-08-12 14:33:25,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.512e+01 2.777e+01 3.126e+01 4.363e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 14:33:31,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1681210.0, ans=0.1 2024-08-12 14:33:32,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1681210.0, ans=0.1 2024-08-12 14:33:50,470 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 14:34:03,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-08-12 14:34:10,016 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:34:20,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=15.0 2024-08-12 14:34:21,452 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8750, loss[loss=0.1073, beats_loss=0.01131, ecapa_loss=0.0001684, whisper_loss=0.09432, over 18454.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01099, ecapa_loss=0.0001766, whisper_loss=0.09164, over 3873400.74 frames. ], batch size: 75, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:34:30,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1681610.0, ans=0.2 2024-08-12 14:34:35,246 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 14:35:12,878 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 14:35:13,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1681910.0, ans=0.125 2024-08-12 14:35:21,546 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 14:35:28,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1682010.0, ans=0.125 2024-08-12 14:35:31,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1682010.0, ans=0.0 2024-08-12 14:35:33,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8800, loss[loss=0.1102, beats_loss=0.01162, ecapa_loss=0.0001473, whisper_loss=0.09708, over 24969.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001756, whisper_loss=0.09202, over 3882127.96 frames. ], batch size: 96, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:35:34,231 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 14:35:38,248 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 14:35:51,510 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 14:35:53,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-08-12 14:35:53,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.570e+01 2.828e+01 3.387e+01 1.190e+02, threshold=5.656e+01, percent-clipped=1.0 2024-08-12 14:36:02,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-12 14:36:20,683 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 14:36:24,865 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 14:36:50,013 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 14:36:56,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8850, loss[loss=0.09846, beats_loss=0.01107, ecapa_loss=0.000151, whisper_loss=0.08589, over 22287.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01106, ecapa_loss=0.0001741, whisper_loss=0.09211, over 3891973.12 frames. ], batch size: 88, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:36:56,722 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 14:37:08,160 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 14:37:13,849 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 14:37:17,270 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 14:37:25,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.74 vs. limit=10.0 2024-08-12 14:37:36,473 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 14:38:03,372 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 14:38:16,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8900, loss[loss=0.1016, beats_loss=0.01123, ecapa_loss=0.0001488, whisper_loss=0.08885, over 17075.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01109, ecapa_loss=0.0001732, whisper_loss=0.09198, over 3859725.42 frames. ], batch size: 67, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:38:27,283 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:38:37,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.454e+01 2.719e+01 3.172e+01 4.928e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-12 14:38:38,221 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 14:38:45,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1683210.0, ans=0.125 2024-08-12 14:39:06,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1683410.0, ans=0.2 2024-08-12 14:39:10,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2024-08-12 14:39:38,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 8950, loss[loss=0.116, beats_loss=0.01113, ecapa_loss=0.0001557, whisper_loss=0.1033, over 14589.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01106, ecapa_loss=0.0001734, whisper_loss=0.09198, over 3849950.05 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:39:39,453 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 14:39:41,142 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 14:39:49,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-12 14:39:59,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1683710.0, ans=0.1 2024-08-12 14:40:05,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1683710.0, ans=0.125 2024-08-12 14:40:09,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1683710.0, ans=0.2 2024-08-12 14:40:30,471 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-12 14:40:35,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1683910.0, ans=0.1 2024-08-12 14:40:59,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9000, loss[loss=0.1129, beats_loss=0.0103, ecapa_loss=0.0001646, whisper_loss=0.1009, over 23955.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001735, whisper_loss=0.09227, over 3853851.11 frames. ], batch size: 95, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:40:59,043 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 14:41:38,402 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.000585, whisper_loss=0.2487, over 922467.00 frames. 2024-08-12 14:41:57,468 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on SV_voxceleb1: loss=0.004785, beats_loss=0, ecapa_loss=0.0004785, whisper_loss=0, over 939242.00 frames. 2024-08-12 14:43:56,679 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on AT_audioset: loss=0.02422, beats_loss=0.02422, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 14:43:56,683 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 14:44:15,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.474e+01 2.766e+01 3.028e+01 3.985e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:44:25,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1684210.0, ans=0.125 2024-08-12 14:44:26,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2024-08-12 14:44:35,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2024-08-12 14:44:49,348 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 14:44:49,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1684410.0, ans=0.0 2024-08-12 14:44:49,987 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2024-08-12 14:44:54,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1684410.0, ans=0.125 2024-08-12 14:45:15,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9050, loss[loss=0.09567, beats_loss=0.009533, ecapa_loss=0.0002269, whisper_loss=0.08387, over 22534.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001749, whisper_loss=0.09252, over 3849612.47 frames. ], batch size: 91, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:45:46,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1684810.0, ans=0.125 2024-08-12 14:46:00,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.07 vs. limit=10.0 2024-08-12 14:46:03,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1684910.0, ans=0.1 2024-08-12 14:46:17,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1684910.0, ans=0.125 2024-08-12 14:46:20,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1685010.0, ans=0.125 2024-08-12 14:46:24,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1685010.0, ans=0.0 2024-08-12 14:46:35,114 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9100, loss[loss=0.1078, beats_loss=0.01297, ecapa_loss=0.0001351, whisper_loss=0.09344, over 18681.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01087, ecapa_loss=0.0001763, whisper_loss=0.09267, over 3825676.42 frames. ], batch size: 71, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:46:38,132 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 14:46:48,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1685210.0, ans=0.125 2024-08-12 14:46:51,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1685210.0, ans=0.1 2024-08-12 14:46:52,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.564e+01 2.836e+01 3.271e+01 5.149e+01, threshold=5.673e+01, percent-clipped=0.0 2024-08-12 14:46:58,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1685210.0, ans=0.09899494936611666 2024-08-12 14:47:01,347 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-12 14:47:03,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1685210.0, ans=0.125 2024-08-12 14:47:09,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1685310.0, ans=0.09899494936611666 2024-08-12 14:47:14,363 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 14:47:18,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1685310.0, ans=0.125 2024-08-12 14:47:25,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1685410.0, ans=0.1 2024-08-12 14:47:41,647 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 14:47:48,612 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 14:47:51,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9150, loss[loss=0.07628, beats_loss=0.0139, ecapa_loss=0.0001717, whisper_loss=0.06067, over 18153.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01089, ecapa_loss=0.0001765, whisper_loss=0.09196, over 3832669.18 frames. ], batch size: 74, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:47:58,810 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 14:48:01,737 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 14:48:04,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1685610.0, ans=0.07 2024-08-12 14:48:32,186 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-12 14:48:43,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1685910.0, ans=0.0 2024-08-12 14:48:45,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-12 14:48:52,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1686010.0, ans=0.07 2024-08-12 14:48:58,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-08-12 14:49:04,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-08-12 14:49:05,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1686110.0, ans=0.1 2024-08-12 14:49:06,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9200, loss[loss=0.09104, beats_loss=0.0137, ecapa_loss=0.0001803, whisper_loss=0.07553, over 19400.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001763, whisper_loss=0.09158, over 3843514.25 frames. ], batch size: 81, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:49:22,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1686210.0, ans=0.125 2024-08-12 14:49:23,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.540e+01 2.969e+01 3.284e+01 5.041e+01, threshold=5.938e+01, percent-clipped=0.0 2024-08-12 14:49:28,247 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 14:49:56,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1686410.0, ans=0.125 2024-08-12 14:50:04,483 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:50:24,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9250, loss[loss=0.101, beats_loss=0.013, ecapa_loss=0.0001578, whisper_loss=0.08638, over 22361.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001765, whisper_loss=0.09179, over 3872291.36 frames. ], batch size: 91, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:50:38,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1686610.0, ans=0.2 2024-08-12 14:51:19,290 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 14:51:25,987 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 14:51:33,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1687010.0, ans=0.0 2024-08-12 14:51:42,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1687010.0, ans=0.1 2024-08-12 14:51:42,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2024-08-12 14:51:46,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1687010.0, ans=0.0 2024-08-12 14:51:49,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9300, loss[loss=0.1153, beats_loss=0.008487, ecapa_loss=0.0001848, whisper_loss=0.105, over 14384.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001775, whisper_loss=0.09212, over 3888412.46 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:51:57,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1687110.0, ans=0.0 2024-08-12 14:52:00,411 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 14:52:09,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.511e+01 2.773e+01 3.215e+01 9.080e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-12 14:52:40,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1687410.0, ans=0.125 2024-08-12 14:52:58,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1687510.0, ans=0.125 2024-08-12 14:52:58,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1687510.0, ans=0.125 2024-08-12 14:53:14,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9350, loss[loss=0.1153, beats_loss=0.01126, ecapa_loss=0.0001862, whisper_loss=0.1021, over 18018.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.0001762, whisper_loss=0.09253, over 3887307.97 frames. ], batch size: 70, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:53:29,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2024-08-12 14:53:40,231 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 14:54:09,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-08-12 14:54:25,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1687910.0, ans=0.1 2024-08-12 14:54:29,908 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 32 from Vox, 22 fro AS 2024-08-12 14:54:52,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9400, loss[loss=0.08375, beats_loss=0.01336, ecapa_loss=0.0001552, whisper_loss=0.06884, over 21135.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001769, whisper_loss=0.0924, over 3888825.41 frames. ], batch size: 88, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:55:00,931 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 28 from LS+wenet, 8 from Vox, 19 fro AS 2024-08-12 14:55:01,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1688110.0, ans=0.0 2024-08-12 14:55:18,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.357e+01 2.577e+01 2.940e+01 4.355e+01, threshold=5.154e+01, percent-clipped=0.0 2024-08-12 14:55:22,903 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-12 14:55:23,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1688210.0, ans=0.0 2024-08-12 14:55:26,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1688210.0, ans=0.125 2024-08-12 14:55:30,141 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 26 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 14:55:47,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1688310.0, ans=0.5 2024-08-12 14:56:17,866 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:56:17,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1688510.0, ans=0.125 2024-08-12 14:56:29,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9450, loss[loss=0.1052, beats_loss=0.01182, ecapa_loss=0.000159, whisper_loss=0.09177, over 21795.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01087, ecapa_loss=0.0001756, whisper_loss=0.09232, over 3858082.65 frames. ], batch size: 86, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:56:36,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.42 vs. limit=22.5 2024-08-12 14:57:22,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-08-12 14:57:25,196 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 14:57:26,173 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09947884827852249, model_norm_threshold=51.535552978515625 2024-08-12 14:57:26,352 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.99, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.656e+05, grad_sumsq=2.952e+04, orig_rms_sq=8.999e+00 2024-08-12 14:57:26,977 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 14:57:32,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1688910.0, ans=0.05 2024-08-12 14:57:34,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1688910.0, ans=0.0 2024-08-12 14:57:39,652 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 14:57:56,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1689010.0, ans=0.125 2024-08-12 14:58:02,632 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9500, loss[loss=0.08954, beats_loss=0.01258, ecapa_loss=0.0001627, whisper_loss=0.07533, over 14403.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.0001756, whisper_loss=0.09167, over 3864226.29 frames. ], batch size: 55, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:58:12,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1689110.0, ans=0.125 2024-08-12 14:58:25,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.537e+01 2.807e+01 3.213e+01 5.181e+02, threshold=5.615e+01, percent-clipped=1.0 2024-08-12 14:58:43,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1689310.0, ans=0.125 2024-08-12 14:58:45,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1689310.0, ans=0.04949747468305833 2024-08-12 14:59:13,983 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 14:59:15,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1689510.0, ans=0.0 2024-08-12 14:59:16,554 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 14:59:26,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9550, loss[loss=0.0796, beats_loss=0.01545, ecapa_loss=0.0001276, whisper_loss=0.06288, over 20396.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01107, ecapa_loss=0.0001756, whisper_loss=0.09046, over 3857532.17 frames. ], batch size: 82, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:59:56,118 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 14:59:57,415 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 15:00:16,634 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 39 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 15:00:29,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1690010.0, ans=0.0 2024-08-12 15:00:35,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9600, loss[loss=0.09163, beats_loss=0.01116, ecapa_loss=0.0001414, whisper_loss=0.07905, over 15804.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01105, ecapa_loss=0.000175, whisper_loss=0.0909, over 3864221.84 frames. ], batch size: 59, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:00:41,498 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-12 15:00:49,481 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 15:00:53,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.591e+01 2.857e+01 3.252e+01 5.691e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 15:00:53,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1690210.0, ans=0.0 2024-08-12 15:00:53,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1690210.0, ans=0.1 2024-08-12 15:01:12,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1690310.0, ans=0.1 2024-08-12 15:01:40,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1690510.0, ans=0.04949747468305833 2024-08-12 15:01:42,854 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 15:01:44,260 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9650, loss[loss=0.1238, beats_loss=0.01148, ecapa_loss=0.0002048, whisper_loss=0.1103, over 21793.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01102, ecapa_loss=0.0001766, whisper_loss=0.09095, over 3827331.32 frames. ], batch size: 90, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:02:06,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1690710.0, ans=0.0 2024-08-12 15:02:11,746 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 15:02:27,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1690910.0, ans=0.125 2024-08-12 15:02:28,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1690910.0, ans=0.125 2024-08-12 15:02:43,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1691010.0, ans=0.0 2024-08-12 15:02:43,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1691010.0, ans=0.125 2024-08-12 15:02:45,728 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-12 15:02:53,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9700, loss[loss=0.1079, beats_loss=0.01108, ecapa_loss=0.0001676, whisper_loss=0.09518, over 23342.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001773, whisper_loss=0.09102, over 3807289.01 frames. ], batch size: 91, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:03:06,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1691210.0, ans=10.0 2024-08-12 15:03:10,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.535e+01 2.821e+01 3.429e+01 6.519e+01, threshold=5.641e+01, percent-clipped=1.0 2024-08-12 15:03:10,898 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 15:03:33,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1691410.0, ans=0.125 2024-08-12 15:03:56,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-08-12 15:04:00,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2024-08-12 15:04:04,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9750, loss[loss=0.09519, beats_loss=0.01033, ecapa_loss=0.0002092, whisper_loss=0.08277, over 16951.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01106, ecapa_loss=0.000176, whisper_loss=0.0903, over 3824457.17 frames. ], batch size: 67, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:04:11,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1691610.0, ans=0.1 2024-08-12 15:04:15,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1691610.0, ans=0.125 2024-08-12 15:04:19,875 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.426e+00 2024-08-12 15:04:23,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1691710.0, ans=0.125 2024-08-12 15:04:25,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2024-08-12 15:04:29,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1691710.0, ans=0.125 2024-08-12 15:04:30,433 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 15:05:06,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1692010.0, ans=0.1 2024-08-12 15:05:12,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9800, loss[loss=0.1116, beats_loss=0.01052, ecapa_loss=0.0001738, whisper_loss=0.09933, over 24047.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01107, ecapa_loss=0.0001752, whisper_loss=0.09067, over 3829108.97 frames. ], batch size: 92, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:05:22,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=15.0 2024-08-12 15:05:24,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1692210.0, ans=0.125 2024-08-12 15:05:30,087 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.561e+01 2.818e+01 3.285e+01 1.389e+02, threshold=5.636e+01, percent-clipped=4.0 2024-08-12 15:06:03,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1692410.0, ans=0.0 2024-08-12 15:06:11,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1692510.0, ans=0.125 2024-08-12 15:06:16,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2024-08-12 15:06:19,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9850, loss[loss=0.1135, beats_loss=0.01026, ecapa_loss=0.000213, whisper_loss=0.1011, over 20495.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01107, ecapa_loss=0.0001756, whisper_loss=0.09082, over 3853082.71 frames. ], batch size: 85, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:06:22,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1692610.0, ans=0.125 2024-08-12 15:06:30,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2024-08-12 15:06:36,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1692710.0, ans=0.125 2024-08-12 15:06:38,783 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 15:07:01,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1692910.0, ans=0.0 2024-08-12 15:07:11,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1692910.0, ans=0.2 2024-08-12 15:07:13,470 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 15:07:28,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9900, loss[loss=0.09265, beats_loss=0.01227, ecapa_loss=0.0001606, whisper_loss=0.07877, over 22785.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01116, ecapa_loss=0.0001757, whisper_loss=0.09034, over 3875587.93 frames. ], batch size: 92, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:07:30,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1693110.0, ans=0.0 2024-08-12 15:07:37,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1693110.0, ans=0.125 2024-08-12 15:07:38,352 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 15:07:44,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1693210.0, ans=0.125 2024-08-12 15:07:46,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.533e+01 2.789e+01 3.190e+01 6.872e+01, threshold=5.578e+01, percent-clipped=1.0 2024-08-12 15:07:59,372 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-12 15:08:03,629 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 15:08:07,738 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 15:08:08,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=12.0 2024-08-12 15:08:34,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2024-08-12 15:08:37,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1693610.0, ans=0.07 2024-08-12 15:08:38,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 9950, loss[loss=0.09983, beats_loss=0.01053, ecapa_loss=0.0002302, whisper_loss=0.087, over 21924.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01114, ecapa_loss=0.0001761, whisper_loss=0.09058, over 3858450.65 frames. ], batch size: 94, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:08:45,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1693610.0, ans=0.2 2024-08-12 15:08:53,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1693710.0, ans=0.125 2024-08-12 15:08:57,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1693710.0, ans=0.2 2024-08-12 15:09:00,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1693710.0, ans=0.0 2024-08-12 15:09:10,038 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 15:09:15,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1693810.0, ans=0.0 2024-08-12 15:09:29,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1693910.0, ans=0.125 2024-08-12 15:09:51,434 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 15:09:52,826 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 15:09:54,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10000, loss[loss=0.1142, beats_loss=0.009501, ecapa_loss=0.0002054, whisper_loss=0.1027, over 19280.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01113, ecapa_loss=0.0001762, whisper_loss=0.0908, over 3866508.45 frames. ], batch size: 81, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:09:59,594 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-12 15:10:04,882 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 15:10:11,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.583e+01 2.831e+01 3.339e+01 3.966e+02, threshold=5.663e+01, percent-clipped=2.0 2024-08-12 15:10:34,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1694410.0, ans=0.0 2024-08-12 15:10:37,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1694410.0, ans=0.2 2024-08-12 15:10:38,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-08-12 15:10:43,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1694410.0, ans=0.1 2024-08-12 15:10:44,296 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 32 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 15:10:46,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1694410.0, ans=0.0 2024-08-12 15:10:57,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1694510.0, ans=0.035 2024-08-12 15:11:01,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10050, loss[loss=0.08313, beats_loss=0.01502, ecapa_loss=0.0001657, whisper_loss=0.06645, over 22296.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01114, ecapa_loss=0.0001752, whisper_loss=0.09114, over 3878706.68 frames. ], batch size: 93, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:11:06,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1694610.0, ans=0.125 2024-08-12 15:11:28,899 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 15:11:30,176 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 15:11:34,810 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 15:11:36,469 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-12 15:11:40,332 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 15:11:47,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1694910.0, ans=0.125 2024-08-12 15:12:10,374 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 15:12:14,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10100, loss[loss=0.1358, beats_loss=0.007719, ecapa_loss=0.0001723, whisper_loss=0.1264, over 18089.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.0001746, whisper_loss=0.09168, over 3880379.46 frames. ], batch size: 67, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:12:28,191 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 15:12:29,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1695210.0, ans=0.0 2024-08-12 15:12:30,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2024-08-12 15:12:33,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.463e+01 2.716e+01 3.042e+01 6.161e+01, threshold=5.433e+01, percent-clipped=3.0 2024-08-12 15:12:49,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1695310.0, ans=0.1 2024-08-12 15:12:56,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1695310.0, ans=0.0 2024-08-12 15:13:00,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1695410.0, ans=0.125 2024-08-12 15:13:01,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2024-08-12 15:13:04,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1695410.0, ans=0.0 2024-08-12 15:13:05,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1695410.0, ans=0.125 2024-08-12 15:13:11,441 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 15:13:28,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10150, loss[loss=0.1056, beats_loss=0.009403, ecapa_loss=0.0001986, whisper_loss=0.09423, over 18290.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01107, ecapa_loss=0.0001761, whisper_loss=0.09064, over 3875205.75 frames. ], batch size: 76, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:13:29,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=12.0 2024-08-12 15:13:47,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1695710.0, ans=0.1 2024-08-12 15:13:52,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1695710.0, ans=0.1 2024-08-12 15:13:53,455 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 15:13:54,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2024-08-12 15:13:54,840 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 15:14:28,200 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 15:14:32,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1696010.0, ans=0.125 2024-08-12 15:14:36,513 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10200, loss[loss=0.09638, beats_loss=0.01199, ecapa_loss=0.0001806, whisper_loss=0.08258, over 13392.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001756, whisper_loss=0.0918, over 3872729.00 frames. ], batch size: 54, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:14:54,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.514e+01 2.832e+01 3.281e+01 6.809e+01, threshold=5.664e+01, percent-clipped=1.0 2024-08-12 15:14:57,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1696210.0, ans=0.1 2024-08-12 15:15:28,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1696410.0, ans=0.2 2024-08-12 15:15:38,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1696510.0, ans=0.1 2024-08-12 15:15:45,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1696610.0, ans=0.1 2024-08-12 15:15:46,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10250, loss[loss=0.0952, beats_loss=0.01356, ecapa_loss=0.0001768, whisper_loss=0.07987, over 21010.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001753, whisper_loss=0.09153, over 3850925.76 frames. ], batch size: 86, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:15:53,197 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 15:16:10,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1696710.0, ans=0.09899494936611666 2024-08-12 15:16:30,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1696910.0, ans=0.125 2024-08-12 15:16:30,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-08-12 15:16:37,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1696910.0, ans=0.2 2024-08-12 15:16:37,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-08-12 15:16:38,269 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 15:16:39,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2024-08-12 15:16:49,713 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 15:16:51,356 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 15:16:54,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1697010.0, ans=0.125 2024-08-12 15:16:57,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10300, loss[loss=0.1117, beats_loss=0.009432, ecapa_loss=0.0002192, whisper_loss=0.1, over 21705.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0001759, whisper_loss=0.09116, over 3863520.89 frames. ], batch size: 91, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:17:16,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.570e+01 2.801e+01 3.230e+01 4.716e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-12 15:17:27,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1697310.0, ans=0.125 2024-08-12 15:18:04,198 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 15:18:09,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10350, loss[loss=0.1236, beats_loss=0.0116, ecapa_loss=0.0001313, whisper_loss=0.1107, over 24314.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01101, ecapa_loss=0.0001761, whisper_loss=0.09136, over 3881042.84 frames. ], batch size: 92, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:18:15,025 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-12 15:18:23,350 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 26 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-12 15:18:29,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1697710.0, ans=0.125 2024-08-12 15:18:45,267 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 15:18:45,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1697810.0, ans=0.07 2024-08-12 15:18:53,166 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 13 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 15:19:05,517 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 15:19:17,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10400, loss[loss=0.1032, beats_loss=0.01138, ecapa_loss=0.0001487, whisper_loss=0.09034, over 18456.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001751, whisper_loss=0.09114, over 3850640.89 frames. ], batch size: 72, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:19:19,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1698110.0, ans=0.1 2024-08-12 15:19:31,949 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:19:35,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.431e+01 2.766e+01 3.090e+01 4.882e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 15:19:39,358 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 15:19:56,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1698410.0, ans=0.0 2024-08-12 15:20:11,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1698510.0, ans=0.125 2024-08-12 15:20:13,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-08-12 15:20:20,765 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.129e+05 2024-08-12 15:20:24,517 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10450, loss[loss=0.0922, beats_loss=0.01177, ecapa_loss=0.0001746, whisper_loss=0.07868, over 17978.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001749, whisper_loss=0.09136, over 3849047.65 frames. ], batch size: 76, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:20:25,088 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.055e+00 2024-08-12 15:20:28,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=29.53 vs. limit=15.0 2024-08-12 15:20:30,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1698610.0, ans=0.125 2024-08-12 15:20:38,539 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 15:20:39,693 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 15:21:25,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-08-12 15:21:26,359 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 15:21:32,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10500, loss[loss=0.08875, beats_loss=0.01068, ecapa_loss=0.0002588, whisper_loss=0.07548, over 16308.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.0001749, whisper_loss=0.09141, over 3854163.90 frames. ], batch size: 76, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:21:41,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-12 15:21:48,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-08-12 15:21:50,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.539e+01 2.734e+01 3.108e+01 4.878e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 15:21:53,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1699210.0, ans=0.125 2024-08-12 15:21:56,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1699210.0, ans=0.125 2024-08-12 15:22:04,160 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 15:22:11,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1699310.0, ans=0.04949747468305833 2024-08-12 15:22:23,235 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 15:22:32,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1699510.0, ans=0.2 2024-08-12 15:22:33,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1699510.0, ans=0.0 2024-08-12 15:22:40,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10550, loss[loss=0.09251, beats_loss=0.01037, ecapa_loss=0.0001786, whisper_loss=0.08036, over 14497.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01095, ecapa_loss=0.0001751, whisper_loss=0.09109, over 3825621.95 frames. ], batch size: 57, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:23:09,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-08-12 15:23:15,308 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 15:23:24,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=12.0 2024-08-12 15:23:26,824 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 15:23:40,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=12.0 2024-08-12 15:23:44,143 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 15:23:47,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.88 vs. limit=15.0 2024-08-12 15:23:49,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1700010.0, ans=0.125 2024-08-12 15:23:52,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10600, loss[loss=0.0872, beats_loss=0.01304, ecapa_loss=0.0001545, whisper_loss=0.07261, over 21903.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01106, ecapa_loss=0.0001744, whisper_loss=0.09103, over 3845065.22 frames. ], batch size: 87, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:23:57,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1700110.0, ans=0.0 2024-08-12 15:24:05,535 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 15:24:13,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.487e+01 2.727e+01 3.054e+01 5.238e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 15:24:21,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1700310.0, ans=0.125 2024-08-12 15:24:26,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1700310.0, ans=0.05 2024-08-12 15:24:48,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1700410.0, ans=0.1 2024-08-12 15:24:57,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1700510.0, ans=0.0 2024-08-12 15:25:07,450 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10650, loss[loss=0.08822, beats_loss=0.0132, ecapa_loss=0.0001686, whisper_loss=0.07334, over 22074.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01117, ecapa_loss=0.0001736, whisper_loss=0.09037, over 3833321.59 frames. ], batch size: 91, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:25:13,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1700610.0, ans=0.2 2024-08-12 15:25:25,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1700710.0, ans=0.125 2024-08-12 15:25:30,176 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 15:25:35,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1700810.0, ans=0.125 2024-08-12 15:25:39,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1700810.0, ans=0.125 2024-08-12 15:25:41,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1700810.0, ans=0.125 2024-08-12 15:25:46,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1700810.0, ans=0.1 2024-08-12 15:25:46,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1700810.0, ans=0.2 2024-08-12 15:25:49,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1700810.0, ans=0.0 2024-08-12 15:25:54,546 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 15:25:58,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1700910.0, ans=0.125 2024-08-12 15:26:03,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1700910.0, ans=0.0 2024-08-12 15:26:09,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1701010.0, ans=0.1 2024-08-12 15:26:12,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-12 15:26:18,946 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 15:26:20,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10700, loss[loss=0.1029, beats_loss=0.01198, ecapa_loss=0.0001781, whisper_loss=0.08913, over 18940.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01115, ecapa_loss=0.0001727, whisper_loss=0.09104, over 3849871.35 frames. ], batch size: 78, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:26:39,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.524e+01 2.760e+01 3.145e+01 5.039e+01, threshold=5.520e+01, percent-clipped=0.0 2024-08-12 15:26:50,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1701310.0, ans=0.1 2024-08-12 15:26:51,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1701310.0, ans=0.125 2024-08-12 15:27:07,872 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 15:27:22,889 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 15:27:27,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10750, loss[loss=0.109, beats_loss=0.008506, ecapa_loss=0.0001941, whisper_loss=0.09856, over 14928.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001726, whisper_loss=0.09189, over 3876392.24 frames. ], batch size: 60, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:27:47,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-12 15:28:03,401 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 15:28:09,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1701910.0, ans=0.025 2024-08-12 15:28:10,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1701910.0, ans=0.125 2024-08-12 15:28:16,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1701910.0, ans=0.0 2024-08-12 15:28:33,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-12 15:28:35,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10800, loss[loss=0.094, beats_loss=0.01251, ecapa_loss=0.0001611, whisper_loss=0.07988, over 17794.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01108, ecapa_loss=0.0001733, whisper_loss=0.09136, over 3864219.60 frames. ], batch size: 70, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:28:54,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.536e+01 2.905e+01 3.267e+01 1.637e+02, threshold=5.810e+01, percent-clipped=2.0 2024-08-12 15:28:54,592 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 15:28:59,755 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 12 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 15:29:00,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-08-12 15:29:09,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1702310.0, ans=0.2 2024-08-12 15:29:09,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-12 15:29:18,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1702410.0, ans=0.0 2024-08-12 15:29:30,496 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 15:29:31,852 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 15:29:33,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1702510.0, ans=0.1 2024-08-12 15:29:41,502 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 15:29:41,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1702610.0, ans=0.125 2024-08-12 15:29:42,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10850, loss[loss=0.1155, beats_loss=0.009673, ecapa_loss=0.0001618, whisper_loss=0.1042, over 17106.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01099, ecapa_loss=0.0001749, whisper_loss=0.09195, over 3879108.27 frames. ], batch size: 64, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:29:43,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-08-12 15:29:44,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1702610.0, ans=0.125 2024-08-12 15:29:49,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1702610.0, ans=0.0 2024-08-12 15:30:02,514 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 15:30:21,466 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 15:30:24,218 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 15:30:33,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1702910.0, ans=0.1 2024-08-12 15:30:34,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1702910.0, ans=0.1 2024-08-12 15:30:38,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1703010.0, ans=0.125 2024-08-12 15:30:39,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1703010.0, ans=0.125 2024-08-12 15:30:42,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2024-08-12 15:30:43,454 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 15:30:50,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10900, loss[loss=0.09114, beats_loss=0.01106, ecapa_loss=0.0001815, whisper_loss=0.07826, over 17454.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01106, ecapa_loss=0.000175, whisper_loss=0.09148, over 3891154.06 frames. ], batch size: 72, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:30:52,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1703110.0, ans=0.125 2024-08-12 15:30:54,245 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-12 15:30:57,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2024-08-12 15:31:00,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1703110.0, ans=0.125 2024-08-12 15:31:03,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-12 15:31:08,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1703210.0, ans=0.2 2024-08-12 15:31:08,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.493e+01 2.855e+01 3.171e+01 4.648e+01, threshold=5.710e+01, percent-clipped=0.0 2024-08-12 15:31:15,369 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-12 15:31:20,276 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.411e+01 2024-08-12 15:31:24,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=15.0 2024-08-12 15:31:36,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1703410.0, ans=0.0 2024-08-12 15:31:42,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1703510.0, ans=0.125 2024-08-12 15:31:43,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1703510.0, ans=0.0 2024-08-12 15:31:56,061 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 15:31:57,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 10950, loss[loss=0.1235, beats_loss=0.009634, ecapa_loss=0.0002025, whisper_loss=0.1119, over 18100.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01099, ecapa_loss=0.000175, whisper_loss=0.09234, over 3900947.25 frames. ], batch size: 72, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:32:00,136 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 15:32:02,978 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 6 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 15:32:21,425 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 15:32:37,542 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:32:37,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-12 15:32:43,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1703810.0, ans=0.0 2024-08-12 15:32:44,513 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 15:33:06,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1704010.0, ans=0.0 2024-08-12 15:33:09,400 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 15:33:11,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-12 15:33:13,207 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11000, loss[loss=0.1104, beats_loss=0.0112, ecapa_loss=0.0001845, whisper_loss=0.09737, over 22954.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0109, ecapa_loss=0.0001764, whisper_loss=0.09253, over 3890595.74 frames. ], batch size: 92, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:33:32,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.453e+01 2.776e+01 3.261e+01 5.617e+01, threshold=5.552e+01, percent-clipped=0.0 2024-08-12 15:33:37,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1704210.0, ans=0.125 2024-08-12 15:33:49,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1704310.0, ans=0.1 2024-08-12 15:33:53,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1704410.0, ans=0.1 2024-08-12 15:33:59,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1704410.0, ans=0.125 2024-08-12 15:34:21,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11050, loss[loss=0.07988, beats_loss=0.01247, ecapa_loss=0.0002258, whisper_loss=0.06515, over 20347.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01086, ecapa_loss=0.0001757, whisper_loss=0.09277, over 3888497.97 frames. ], batch size: 94, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:34:34,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1704710.0, ans=0.0 2024-08-12 15:34:35,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1704710.0, ans=0.1 2024-08-12 15:34:35,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1704710.0, ans=0.0 2024-08-12 15:34:40,516 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 15:34:47,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1704810.0, ans=0.2 2024-08-12 15:34:47,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1704810.0, ans=0.0 2024-08-12 15:34:47,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1704810.0, ans=0.0 2024-08-12 15:35:05,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1704910.0, ans=0.0 2024-08-12 15:35:15,377 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 29 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 15:35:22,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1705010.0, ans=0.2 2024-08-12 15:35:25,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1705010.0, ans=0.2 2024-08-12 15:35:29,215 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11100, loss[loss=0.09092, beats_loss=0.01169, ecapa_loss=0.0001684, whisper_loss=0.07754, over 18913.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01092, ecapa_loss=0.0001753, whisper_loss=0.09224, over 3884631.25 frames. ], batch size: 74, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:35:39,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1705110.0, ans=0.0 2024-08-12 15:35:39,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1705110.0, ans=0.125 2024-08-12 15:35:48,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.398e+01 2.655e+01 3.117e+01 6.342e+01, threshold=5.309e+01, percent-clipped=1.0 2024-08-12 15:35:58,946 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-12 15:36:34,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=12.0 2024-08-12 15:36:37,269 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 15:36:38,739 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11150, loss[loss=0.08104, beats_loss=0.0102, ecapa_loss=0.0001998, whisper_loss=0.06883, over 13675.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01097, ecapa_loss=0.0001734, whisper_loss=0.09245, over 3882246.96 frames. ], batch size: 55, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:36:40,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1705610.0, ans=0.125 2024-08-12 15:36:51,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1705710.0, ans=0.1 2024-08-12 15:37:13,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1705810.0, ans=0.125 2024-08-12 15:37:21,525 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-12 15:37:29,663 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 15:37:41,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2024-08-12 15:37:43,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-12 15:37:46,071 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11200, loss[loss=0.1011, beats_loss=0.01085, ecapa_loss=0.0002198, whisper_loss=0.08804, over 19420.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001755, whisper_loss=0.09256, over 3893613.68 frames. ], batch size: 79, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:38:03,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1706210.0, ans=0.125 2024-08-12 15:38:05,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.490e+01 2.836e+01 3.047e+01 5.086e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 15:38:07,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-08-12 15:38:17,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1706310.0, ans=0.0 2024-08-12 15:38:27,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1706410.0, ans=0.09899494936611666 2024-08-12 15:38:33,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1706410.0, ans=0.125 2024-08-12 15:38:35,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1706410.0, ans=0.125 2024-08-12 15:38:50,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1706510.0, ans=0.02 2024-08-12 15:38:53,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11250, loss[loss=0.09598, beats_loss=0.01077, ecapa_loss=0.0002103, whisper_loss=0.0831, over 18402.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01091, ecapa_loss=0.0001776, whisper_loss=0.09219, over 3880014.61 frames. ], batch size: 75, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:38:59,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1706610.0, ans=0.125 2024-08-12 15:39:06,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2024-08-12 15:39:49,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1707010.0, ans=0.0 2024-08-12 15:39:53,460 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 15:40:01,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11300, loss[loss=0.1002, beats_loss=0.01205, ecapa_loss=0.0001281, whisper_loss=0.0869, over 23573.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001757, whisper_loss=0.09162, over 3863537.56 frames. ], batch size: 91, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:40:18,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-12 15:40:20,376 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.567e+01 2.768e+01 3.157e+01 8.223e+01, threshold=5.536e+01, percent-clipped=2.0 2024-08-12 15:40:41,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1707410.0, ans=0.1 2024-08-12 15:40:46,627 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 15:41:03,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1707510.0, ans=0.2 2024-08-12 15:41:10,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11350, loss[loss=0.132, beats_loss=0.008015, ecapa_loss=0.0001735, whisper_loss=0.1222, over 15805.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.000174, whisper_loss=0.09198, over 3912538.16 frames. ], batch size: 60, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:41:12,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-12 15:41:21,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2024-08-12 15:41:30,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1707710.0, ans=10.0 2024-08-12 15:41:36,745 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 15:41:59,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.92 vs. limit=22.5 2024-08-12 15:42:08,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1708010.0, ans=0.0 2024-08-12 15:42:17,872 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11400, loss[loss=0.09947, beats_loss=0.01172, ecapa_loss=0.0001607, whisper_loss=0.08615, over 22377.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01095, ecapa_loss=0.0001742, whisper_loss=0.09153, over 3855816.43 frames. ], batch size: 91, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:42:25,748 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 15:42:27,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1708110.0, ans=0.125 2024-08-12 15:42:36,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.714e+01 3.019e+01 3.288e+01 4.590e+01, threshold=6.038e+01, percent-clipped=0.0 2024-08-12 15:42:50,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1708310.0, ans=0.125 2024-08-12 15:43:07,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1708410.0, ans=0.125 2024-08-12 15:43:08,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1708410.0, ans=0.07 2024-08-12 15:43:15,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1708510.0, ans=0.05 2024-08-12 15:43:19,707 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.067e+05 2024-08-12 15:43:21,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1708510.0, ans=0.0 2024-08-12 15:43:25,903 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11450, loss[loss=0.09987, beats_loss=0.01124, ecapa_loss=0.0002141, whisper_loss=0.08649, over 17344.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01088, ecapa_loss=0.0001752, whisper_loss=0.09185, over 3850292.73 frames. ], batch size: 73, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:43:34,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1708610.0, ans=0.1 2024-08-12 15:43:36,452 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 15:43:38,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-12 15:43:54,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1708810.0, ans=0.0 2024-08-12 15:44:13,783 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-12 15:44:17,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1708910.0, ans=0.2 2024-08-12 15:44:23,566 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 15:44:34,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11500, loss[loss=0.1126, beats_loss=0.01012, ecapa_loss=0.0002084, whisper_loss=0.1004, over 19379.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001741, whisper_loss=0.0913, over 3852187.26 frames. ], batch size: 80, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:44:35,879 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 27 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-12 15:44:36,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2024-08-12 15:44:43,994 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 15:44:54,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.425e+01 2.764e+01 3.070e+01 5.781e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-12 15:44:55,552 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 15:44:58,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1709210.0, ans=0.125 2024-08-12 15:45:06,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1709310.0, ans=0.125 2024-08-12 15:45:18,365 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 15:45:26,108 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.777e-03 2024-08-12 15:45:47,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11550, loss[loss=0.0985, beats_loss=0.01092, ecapa_loss=0.0001372, whisper_loss=0.08621, over 21757.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01091, ecapa_loss=0.000174, whisper_loss=0.09211, over 3846378.03 frames. ], batch size: 82, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:45:50,998 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 15:46:20,161 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.754e+00 2024-08-12 15:46:20,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.25 vs. limit=22.5 2024-08-12 15:47:01,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1710010.0, ans=0.125 2024-08-12 15:47:04,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11600, loss[loss=0.1052, beats_loss=0.011, ecapa_loss=0.0001284, whisper_loss=0.0929, over 17233.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01088, ecapa_loss=0.0001745, whisper_loss=0.09258, over 3875904.39 frames. ], batch size: 66, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:47:04,496 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 15:47:08,335 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 15:47:32,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.592e+01 2.931e+01 3.257e+01 5.066e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 15:47:38,139 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 15:47:42,546 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 15:47:47,322 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-12 15:47:48,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1710310.0, ans=0.125 2024-08-12 15:47:53,030 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 15:47:59,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1710310.0, ans=0.125 2024-08-12 15:48:23,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1710410.0, ans=0.0 2024-08-12 15:48:48,931 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-12 15:48:51,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11650, loss[loss=0.1005, beats_loss=0.01293, ecapa_loss=0.0001389, whisper_loss=0.08619, over 23545.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.0001747, whisper_loss=0.09239, over 3882610.19 frames. ], batch size: 91, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:49:05,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1710610.0, ans=0.1 2024-08-12 15:49:11,602 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 15:49:15,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1710710.0, ans=0.125 2024-08-12 15:49:21,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1710710.0, ans=0.0 2024-08-12 15:49:34,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1710710.0, ans=0.0 2024-08-12 15:49:48,294 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-12 15:50:02,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-12 15:50:12,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2024-08-12 15:50:18,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1710910.0, ans=0.125 2024-08-12 15:50:27,611 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 15:50:35,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-12 15:50:51,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1711010.0, ans=0.125 2024-08-12 15:51:06,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11700, loss[loss=0.1107, beats_loss=0.008655, ecapa_loss=0.0001771, whisper_loss=0.1003, over 18375.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001755, whisper_loss=0.09192, over 3877786.77 frames. ], batch size: 68, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:51:09,116 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-12 15:51:17,188 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 15:51:33,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1711210.0, ans=0.0 2024-08-12 15:51:45,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.679e+01 3.031e+01 3.384e+01 8.068e+01, threshold=6.063e+01, percent-clipped=1.0 2024-08-12 15:52:15,866 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 15:52:18,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1711310.0, ans=0.07 2024-08-12 15:52:21,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1711310.0, ans=0.1 2024-08-12 15:52:22,282 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 19 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 15:52:43,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2024-08-12 15:53:20,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11750, loss[loss=0.09849, beats_loss=0.01112, ecapa_loss=0.0001815, whisper_loss=0.08556, over 22230.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01112, ecapa_loss=0.0001749, whisper_loss=0.09133, over 3906178.42 frames. ], batch size: 92, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:53:52,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1711710.0, ans=0.125 2024-08-12 15:54:20,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1711810.0, ans=0.125 2024-08-12 15:54:28,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=12.0 2024-08-12 15:54:38,841 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 15:54:48,537 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 15:54:49,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1712010.0, ans=0.125 2024-08-12 15:55:02,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11800, loss[loss=0.122, beats_loss=0.009274, ecapa_loss=0.0001556, whisper_loss=0.1112, over 15071.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01106, ecapa_loss=0.0001743, whisper_loss=0.09218, over 3912290.68 frames. ], batch size: 55, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:55:06,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2024-08-12 15:55:10,207 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-12 15:55:11,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1712110.0, ans=0.2 2024-08-12 15:55:21,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1712210.0, ans=0.0 2024-08-12 15:55:27,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1712210.0, ans=0.2 2024-08-12 15:55:30,170 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.421e+01 2.823e+01 3.255e+01 8.063e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 15:55:34,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1712210.0, ans=0.125 2024-08-12 15:56:12,432 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 15:56:15,973 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 15:56:31,274 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11850, loss[loss=0.09797, beats_loss=0.01009, ecapa_loss=0.0001638, whisper_loss=0.08624, over 15122.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01103, ecapa_loss=0.0001738, whisper_loss=0.09283, over 3909333.14 frames. ], batch size: 58, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:56:43,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1712610.0, ans=0.125 2024-08-12 15:56:49,361 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 15:56:59,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1712710.0, ans=0.0 2024-08-12 15:57:18,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1712810.0, ans=0.2 2024-08-12 15:57:58,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11900, loss[loss=0.1095, beats_loss=0.01144, ecapa_loss=0.00014, whisper_loss=0.09664, over 23797.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01114, ecapa_loss=0.0001731, whisper_loss=0.09202, over 3907076.90 frames. ], batch size: 93, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:58:03,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1713110.0, ans=0.1 2024-08-12 15:58:06,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1713110.0, ans=0.1 2024-08-12 15:58:12,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1713110.0, ans=0.1 2024-08-12 15:58:23,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1713210.0, ans=0.125 2024-08-12 15:58:24,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.471e+01 2.746e+01 3.069e+01 1.141e+02, threshold=5.492e+01, percent-clipped=1.0 2024-08-12 15:58:40,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1713310.0, ans=0.04949747468305833 2024-08-12 15:58:43,609 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 15:58:46,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1713310.0, ans=0.2 2024-08-12 15:58:54,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.42 vs. limit=22.5 2024-08-12 15:59:07,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1713510.0, ans=0.125 2024-08-12 15:59:19,540 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 22 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-12 15:59:22,587 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 15:59:24,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 11950, loss[loss=0.1044, beats_loss=0.009893, ecapa_loss=0.0002077, whisper_loss=0.09242, over 14589.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01106, ecapa_loss=0.0001741, whisper_loss=0.09222, over 3893490.25 frames. ], batch size: 57, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:59:33,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1713610.0, ans=0.1 2024-08-12 15:59:47,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1713710.0, ans=0.125 2024-08-12 16:00:03,428 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 16:00:09,396 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-12 16:00:18,436 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 16:00:40,340 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 16:00:50,428 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12000, loss[loss=0.07767, beats_loss=0.01326, ecapa_loss=0.0001514, whisper_loss=0.06289, over 14301.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.0001749, whisper_loss=0.09201, over 3873313.01 frames. ], batch size: 59, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:00:50,428 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 16:01:32,501 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005955, whisper_loss=0.2482, over 922467.00 frames. 2024-08-12 16:01:51,975 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on SV_voxceleb1: loss=0.004759, beats_loss=0, ecapa_loss=0.0004759, whisper_loss=0, over 939242.00 frames. 2024-08-12 16:03:43,597 INFO [train_multi_KD3.py:1149] (3/4) Epoch 12, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 16:03:43,602 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 16:03:46,645 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 22 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-12 16:03:47,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-12 16:03:49,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1714110.0, ans=0.125 2024-08-12 16:04:06,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.437e+01 2.734e+01 3.186e+01 7.564e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 16:04:19,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1714310.0, ans=0.04949747468305833 2024-08-12 16:04:20,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1714310.0, ans=0.2 2024-08-12 16:04:21,742 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-12 16:04:50,545 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-12 16:04:59,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12050, loss[loss=0.1045, beats_loss=0.009162, ecapa_loss=0.0001759, whisper_loss=0.09355, over 15235.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01111, ecapa_loss=0.0001739, whisper_loss=0.0914, over 3886610.42 frames. ], batch size: 61, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:05:06,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1714610.0, ans=0.0 2024-08-12 16:05:15,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1714710.0, ans=0.2 2024-08-12 16:05:17,609 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 16:05:18,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1714710.0, ans=0.125 2024-08-12 16:05:49,997 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 16:06:08,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1715010.0, ans=0.125 2024-08-12 16:06:13,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1715010.0, ans=0.0 2024-08-12 16:06:15,833 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12100, loss[loss=0.1082, beats_loss=0.01109, ecapa_loss=0.0002087, whisper_loss=0.09498, over 22222.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0111, ecapa_loss=0.0001739, whisper_loss=0.09114, over 3900037.61 frames. ], batch size: 94, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:06:36,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1715210.0, ans=0.125 2024-08-12 16:06:38,377 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.371e+01 2.653e+01 2.949e+01 4.098e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-12 16:06:42,121 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 16:07:05,339 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 16:07:23,485 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 16:07:34,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1715510.0, ans=15.0 2024-08-12 16:07:36,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12150, loss[loss=0.09414, beats_loss=0.009507, ecapa_loss=0.0002165, whisper_loss=0.08247, over 18717.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001745, whisper_loss=0.09131, over 3897306.06 frames. ], batch size: 78, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:07:37,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1715610.0, ans=0.125 2024-08-12 16:07:38,137 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 16:08:05,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-12 16:08:14,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-12 16:08:33,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1715910.0, ans=0.125 2024-08-12 16:08:33,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1715910.0, ans=0.0 2024-08-12 16:08:51,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12200, loss[loss=0.1096, beats_loss=0.009986, ecapa_loss=0.0001671, whisper_loss=0.09795, over 23644.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01105, ecapa_loss=0.0001747, whisper_loss=0.0907, over 3876141.37 frames. ], batch size: 94, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:08:58,345 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 16:09:07,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-12 16:09:12,990 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 16:09:13,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.462e+01 2.887e+01 3.237e+01 1.771e+02, threshold=5.773e+01, percent-clipped=2.0 2024-08-12 16:09:21,797 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 16:09:23,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1716310.0, ans=0.125 2024-08-12 16:09:24,603 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 16:09:50,355 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 16:09:51,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1716510.0, ans=0.125 2024-08-12 16:10:00,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1716510.0, ans=0.125 2024-08-12 16:10:07,303 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12250, loss[loss=0.1081, beats_loss=0.01073, ecapa_loss=0.000198, whisper_loss=0.09539, over 21513.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0111, ecapa_loss=0.0001742, whisper_loss=0.09036, over 3872601.31 frames. ], batch size: 90, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:10:10,882 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 16:10:15,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 16:10:27,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1716710.0, ans=0.1 2024-08-12 16:10:33,037 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 16:10:34,689 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 16:10:36,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1716710.0, ans=0.0 2024-08-12 16:10:39,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1716810.0, ans=0.125 2024-08-12 16:10:40,700 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 16:10:55,833 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-12 16:11:00,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1716910.0, ans=0.0 2024-08-12 16:11:06,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1716910.0, ans=0.09899494936611666 2024-08-12 16:11:06,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2024-08-12 16:11:27,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12300, loss[loss=0.0983, beats_loss=0.01111, ecapa_loss=0.0001691, whisper_loss=0.0855, over 16674.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001757, whisper_loss=0.09104, over 3852915.33 frames. ], batch size: 63, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:11:29,612 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 16:11:38,161 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 16:11:43,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-12 16:11:52,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.615e+01 2.930e+01 3.275e+01 9.862e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 16:12:19,282 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 16:12:25,795 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 16:12:42,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-08-12 16:12:48,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1717510.0, ans=0.0 2024-08-12 16:12:49,678 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 16:12:51,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12350, loss[loss=0.08425, beats_loss=0.01384, ecapa_loss=0.0001687, whisper_loss=0.06872, over 16604.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01098, ecapa_loss=0.000176, whisper_loss=0.09077, over 3865185.33 frames. ], batch size: 70, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:13:11,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1717710.0, ans=0.0 2024-08-12 16:13:15,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1717710.0, ans=0.2 2024-08-12 16:13:23,368 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 16:13:34,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1717810.0, ans=0.125 2024-08-12 16:13:34,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1717810.0, ans=0.2 2024-08-12 16:13:36,283 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 16:13:53,980 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 16:14:12,870 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 16:14:14,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12400, loss[loss=0.08989, beats_loss=0.009537, ecapa_loss=0.0001683, whisper_loss=0.07867, over 16582.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01098, ecapa_loss=0.0001758, whisper_loss=0.09056, over 3873014.91 frames. ], batch size: 66, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:14:16,648 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 16:14:36,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1718210.0, ans=0.0 2024-08-12 16:14:40,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.684e+01 3.067e+01 3.396e+01 5.308e+01, threshold=6.133e+01, percent-clipped=1.0 2024-08-12 16:14:56,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1718310.0, ans=0.125 2024-08-12 16:15:01,039 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 16:15:16,085 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 16:15:28,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1718510.0, ans=0.05 2024-08-12 16:15:29,340 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 16:15:36,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12450, loss[loss=0.1032, beats_loss=0.01189, ecapa_loss=0.0001664, whisper_loss=0.08966, over 20194.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.000175, whisper_loss=0.0911, over 3866585.53 frames. ], batch size: 82, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:15:49,448 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 16:15:49,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1718610.0, ans=0.125 2024-08-12 16:16:30,309 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 16:16:52,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1719010.0, ans=0.2 2024-08-12 16:16:56,341 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12500, loss[loss=0.1463, beats_loss=0.006348, ecapa_loss=0.000222, whisper_loss=0.1378, over 15595.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.0001751, whisper_loss=0.09171, over 3869479.60 frames. ], batch size: 59, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:17:01,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1719110.0, ans=0.1 2024-08-12 16:17:07,123 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 16:17:08,473 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 16:17:19,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.385e+01 2.736e+01 3.208e+01 9.127e+01, threshold=5.473e+01, percent-clipped=1.0 2024-08-12 16:17:27,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1719310.0, ans=0.125 2024-08-12 16:17:50,748 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 16:18:14,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1719510.0, ans=0.0 2024-08-12 16:18:14,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1719510.0, ans=0.2 2024-08-12 16:18:16,534 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12550, loss[loss=0.1128, beats_loss=0.01062, ecapa_loss=0.000186, whisper_loss=0.1004, over 22366.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.0001747, whisper_loss=0.09182, over 3862618.07 frames. ], batch size: 90, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:18:40,876 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:18:40,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1719710.0, ans=0.125 2024-08-12 16:18:48,200 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-12 16:18:51,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1719810.0, ans=0.0 2024-08-12 16:19:38,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12600, loss[loss=0.1068, beats_loss=0.01006, ecapa_loss=0.0001881, whisper_loss=0.09485, over 23407.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.0001746, whisper_loss=0.09236, over 3906194.12 frames. ], batch size: 95, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:19:50,752 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 16:20:03,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.583e+01 2.914e+01 3.404e+01 5.799e+01, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 16:20:05,971 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 16:20:13,822 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 16:20:19,652 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 16:20:35,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1720410.0, ans=0.125 2024-08-12 16:20:39,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1720410.0, ans=0.2 2024-08-12 16:20:40,270 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 16:20:43,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1720510.0, ans=0.1 2024-08-12 16:20:48,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1720510.0, ans=0.1 2024-08-12 16:20:50,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1720510.0, ans=0.1 2024-08-12 16:20:51,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1720510.0, ans=0.0 2024-08-12 16:20:58,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12650, loss[loss=0.107, beats_loss=0.009934, ecapa_loss=0.0001826, whisper_loss=0.09519, over 23324.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.0001748, whisper_loss=0.09166, over 3890335.50 frames. ], batch size: 91, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:21:05,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1720610.0, ans=0.0 2024-08-12 16:21:07,057 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 16:21:07,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1720610.0, ans=0.025 2024-08-12 16:21:14,793 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 16:21:15,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-08-12 16:21:17,850 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.843e-03 2024-08-12 16:22:07,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1721010.0, ans=0.2 2024-08-12 16:22:15,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1721110.0, ans=0.0 2024-08-12 16:22:16,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12700, loss[loss=0.07655, beats_loss=0.01464, ecapa_loss=0.0001407, whisper_loss=0.06051, over 22166.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001753, whisper_loss=0.09177, over 3897520.36 frames. ], batch size: 90, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:22:27,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1721110.0, ans=0.07 2024-08-12 16:22:34,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1721210.0, ans=0.125 2024-08-12 16:22:37,681 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 16 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 16:22:39,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1721210.0, ans=0.125 2024-08-12 16:22:40,115 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.411e+01 2.657e+01 2.975e+01 5.020e+01, threshold=5.313e+01, percent-clipped=0.0 2024-08-12 16:22:40,305 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 16:22:43,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1721210.0, ans=0.125 2024-08-12 16:23:27,409 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:23:35,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12750, loss[loss=0.09938, beats_loss=0.01064, ecapa_loss=0.0002088, whisper_loss=0.08665, over 18521.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01106, ecapa_loss=0.0001754, whisper_loss=0.09168, over 3910468.54 frames. ], batch size: 77, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:23:47,120 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.986e-02 2024-08-12 16:23:58,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1721710.0, ans=0.2 2024-08-12 16:23:58,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-12 16:24:00,981 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 16:24:01,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1721710.0, ans=0.0 2024-08-12 16:24:12,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1721810.0, ans=0.125 2024-08-12 16:24:32,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1721910.0, ans=0.125 2024-08-12 16:24:34,936 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:24:35,853 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-12 16:24:50,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1722010.0, ans=0.1 2024-08-12 16:24:57,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1722110.0, ans=0.1 2024-08-12 16:24:57,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12800, loss[loss=0.08841, beats_loss=0.01239, ecapa_loss=0.000157, whisper_loss=0.07446, over 16981.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01117, ecapa_loss=0.0001761, whisper_loss=0.09085, over 3906272.74 frames. ], batch size: 67, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:24:59,737 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 16:25:00,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-12 16:25:05,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1722110.0, ans=0.125 2024-08-12 16:25:08,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1722110.0, ans=0.125 2024-08-12 16:25:09,240 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 16:25:21,761 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.602e+01 2.886e+01 3.279e+01 7.661e+01, threshold=5.773e+01, percent-clipped=1.0 2024-08-12 16:25:24,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2024-08-12 16:25:27,731 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 16:25:35,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1722310.0, ans=0.0 2024-08-12 16:25:58,755 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 16:26:06,756 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 16:26:07,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1722510.0, ans=0.0 2024-08-12 16:26:18,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12850, loss[loss=0.122, beats_loss=0.009067, ecapa_loss=0.0001771, whisper_loss=0.1111, over 19135.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01112, ecapa_loss=0.0001772, whisper_loss=0.09066, over 3869456.26 frames. ], batch size: 76, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:26:36,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1722710.0, ans=0.0 2024-08-12 16:26:44,094 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 16:26:44,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1722710.0, ans=0.0 2024-08-12 16:26:44,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1722710.0, ans=0.2 2024-08-12 16:26:45,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1722710.0, ans=0.0 2024-08-12 16:26:51,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1722810.0, ans=0.125 2024-08-12 16:27:01,435 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 16:27:01,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1722810.0, ans=0.1 2024-08-12 16:27:08,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1722910.0, ans=0.1 2024-08-12 16:27:15,290 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 16:27:30,726 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-12 16:27:40,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-12 16:27:40,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12900, loss[loss=0.07974, beats_loss=0.01069, ecapa_loss=0.000163, whisper_loss=0.06742, over 14552.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01104, ecapa_loss=0.0001765, whisper_loss=0.09096, over 3856062.80 frames. ], batch size: 58, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:27:42,622 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 16:27:42,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1723110.0, ans=0.125 2024-08-12 16:28:02,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1723210.0, ans=0.2 2024-08-12 16:28:05,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.675e+01 2.950e+01 4.604e+01, threshold=5.350e+01, percent-clipped=0.0 2024-08-12 16:28:09,105 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 16:28:50,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1723510.0, ans=0.125 2024-08-12 16:28:57,513 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 16:29:03,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 12950, loss[loss=0.08541, beats_loss=0.01204, ecapa_loss=0.0001152, whisper_loss=0.07222, over 14489.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001757, whisper_loss=0.09115, over 3860774.88 frames. ], batch size: 54, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:29:28,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1723710.0, ans=15.0 2024-08-12 16:29:33,352 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 16:29:47,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1723810.0, ans=0.2 2024-08-12 16:30:16,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1724010.0, ans=0.0 2024-08-12 16:30:18,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1724010.0, ans=0.125 2024-08-12 16:30:25,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1724010.0, ans=0.0 2024-08-12 16:30:25,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1724010.0, ans=0.04949747468305833 2024-08-12 16:30:28,768 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 16:30:30,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13000, loss[loss=0.1109, beats_loss=0.009988, ecapa_loss=0.0001907, whisper_loss=0.09896, over 16543.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01104, ecapa_loss=0.0001747, whisper_loss=0.09125, over 3860726.35 frames. ], batch size: 61, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:30:31,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1724110.0, ans=0.125 2024-08-12 16:30:34,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1724110.0, ans=0.0 2024-08-12 16:30:43,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1724110.0, ans=0.1 2024-08-12 16:30:55,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.537e+01 2.771e+01 3.073e+01 6.149e+01, threshold=5.541e+01, percent-clipped=2.0 2024-08-12 16:30:56,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1724210.0, ans=0.125 2024-08-12 16:31:15,523 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 16:31:26,557 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 16:31:30,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1724410.0, ans=0.125 2024-08-12 16:31:46,788 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 16:31:51,879 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 16:31:53,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1724610.0, ans=0.125 2024-08-12 16:31:54,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13050, loss[loss=0.1018, beats_loss=0.01121, ecapa_loss=0.0001476, whisper_loss=0.08915, over 22609.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01106, ecapa_loss=0.0001748, whisper_loss=0.09078, over 3872833.57 frames. ], batch size: 87, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:32:02,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1724610.0, ans=0.0 2024-08-12 16:32:03,009 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 16:32:17,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-08-12 16:32:21,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1724710.0, ans=0.1 2024-08-12 16:32:26,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1724810.0, ans=0.125 2024-08-12 16:32:43,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1724910.0, ans=0.1 2024-08-12 16:32:45,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2024-08-12 16:32:51,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-12 16:33:17,073 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 16:33:17,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13100, loss[loss=0.09819, beats_loss=0.01075, ecapa_loss=0.0001976, whisper_loss=0.08546, over 18694.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01104, ecapa_loss=0.0001746, whisper_loss=0.09098, over 3877674.73 frames. ], batch size: 78, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:33:18,254 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 16:33:26,012 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 16:33:35,710 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 16:33:41,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.633e+01 2.841e+01 3.164e+01 5.259e+01, threshold=5.682e+01, percent-clipped=0.0 2024-08-12 16:33:48,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1725310.0, ans=0.125 2024-08-12 16:34:12,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1725410.0, ans=0.125 2024-08-12 16:34:17,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1725410.0, ans=0.0 2024-08-12 16:34:19,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1725410.0, ans=0.2 2024-08-12 16:34:25,851 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 16:34:29,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1725510.0, ans=0.0 2024-08-12 16:34:38,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13150, loss[loss=0.09969, beats_loss=0.01096, ecapa_loss=0.000201, whisper_loss=0.08672, over 15729.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01118, ecapa_loss=0.0001732, whisper_loss=0.09032, over 3895267.48 frames. ], batch size: 66, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:34:43,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2024-08-12 16:34:52,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1725610.0, ans=0.125 2024-08-12 16:35:01,137 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 16:35:11,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1725710.0, ans=0.0 2024-08-12 16:35:17,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1725810.0, ans=0.0 2024-08-12 16:35:29,786 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 16:35:53,818 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 16:35:59,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1726010.0, ans=0.2 2024-08-12 16:36:02,208 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13200, loss[loss=0.08948, beats_loss=0.009608, ecapa_loss=0.0002478, whisper_loss=0.07739, over 16851.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01107, ecapa_loss=0.0001734, whisper_loss=0.09102, over 3886210.56 frames. ], batch size: 70, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:36:08,102 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 16:36:20,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2024-08-12 16:36:22,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1726210.0, ans=0.0 2024-08-12 16:36:25,774 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.556e+01 2.815e+01 3.284e+01 6.256e+01, threshold=5.630e+01, percent-clipped=1.0 2024-08-12 16:36:54,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1726410.0, ans=0.2 2024-08-12 16:37:08,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1726510.0, ans=0.125 2024-08-12 16:37:13,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1726510.0, ans=0.125 2024-08-12 16:37:24,780 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13250, loss[loss=0.1048, beats_loss=0.009886, ecapa_loss=0.0001557, whisper_loss=0.09332, over 18918.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01105, ecapa_loss=0.0001732, whisper_loss=0.09159, over 3915485.62 frames. ], batch size: 73, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:37:39,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1726610.0, ans=0.0 2024-08-12 16:37:53,214 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 16:38:07,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1726810.0, ans=0.2 2024-08-12 16:38:12,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1726810.0, ans=0.125 2024-08-12 16:38:13,391 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 16:38:43,194 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 16:38:49,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13300, loss[loss=0.1284, beats_loss=0.009174, ecapa_loss=0.0002132, whisper_loss=0.1171, over 14044.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01108, ecapa_loss=0.0001725, whisper_loss=0.09115, over 3917956.70 frames. ], batch size: 57, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:38:59,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1727110.0, ans=0.1 2024-08-12 16:39:12,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.550e+01 2.829e+01 3.095e+01 6.127e+01, threshold=5.657e+01, percent-clipped=1.0 2024-08-12 16:39:23,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1727310.0, ans=0.125 2024-08-12 16:39:36,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1727410.0, ans=0.1 2024-08-12 16:39:36,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1727410.0, ans=0.2 2024-08-12 16:39:40,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2024-08-12 16:39:49,138 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 16:39:55,917 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:40:09,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13350, loss[loss=0.1098, beats_loss=0.01301, ecapa_loss=0.0002024, whisper_loss=0.09475, over 17383.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001725, whisper_loss=0.09176, over 3915156.84 frames. ], batch size: 72, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:40:11,002 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 16:40:16,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1727610.0, ans=0.0 2024-08-12 16:40:44,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1727810.0, ans=0.0 2024-08-12 16:40:59,859 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 16:41:00,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1727910.0, ans=0.125 2024-08-12 16:41:06,058 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 16:41:08,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-12 16:41:31,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13400, loss[loss=0.1105, beats_loss=0.007814, ecapa_loss=0.0001797, whisper_loss=0.1009, over 17603.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001717, whisper_loss=0.09198, over 3955043.97 frames. ], batch size: 68, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:41:36,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1728110.0, ans=0.0 2024-08-12 16:41:46,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-08-12 16:41:54,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.756e+01 3.172e+01 3.565e+01 5.325e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-12 16:41:56,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1728210.0, ans=0.125 2024-08-12 16:41:56,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1728210.0, ans=0.0 2024-08-12 16:42:07,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2024-08-12 16:42:16,563 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-12 16:42:50,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13450, loss[loss=0.1084, beats_loss=0.01358, ecapa_loss=0.0001161, whisper_loss=0.09362, over 17675.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01103, ecapa_loss=0.0001718, whisper_loss=0.09182, over 3931377.18 frames. ], batch size: 66, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:42:51,595 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 16:43:11,720 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.511e+00 2024-08-12 16:43:15,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2024-08-12 16:43:40,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1728910.0, ans=0.0 2024-08-12 16:43:51,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1728910.0, ans=0.0 2024-08-12 16:44:07,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1729010.0, ans=0.0 2024-08-12 16:44:09,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1729010.0, ans=0.1 2024-08-12 16:44:12,451 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 16:44:15,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13500, loss[loss=0.1051, beats_loss=0.009407, ecapa_loss=0.0001958, whisper_loss=0.09376, over 19918.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01103, ecapa_loss=0.0001725, whisper_loss=0.09139, over 3909860.31 frames. ], batch size: 81, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:44:38,853 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.512e+01 2.797e+01 3.062e+01 5.746e+01, threshold=5.594e+01, percent-clipped=0.0 2024-08-12 16:44:46,733 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 16:44:59,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1729310.0, ans=0.1 2024-08-12 16:45:14,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-12 16:45:27,013 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 16:45:38,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13550, loss[loss=0.1092, beats_loss=0.009688, ecapa_loss=0.00013, whisper_loss=0.09826, over 16703.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001722, whisper_loss=0.09119, over 3905012.19 frames. ], batch size: 62, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:45:41,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1729610.0, ans=0.0 2024-08-12 16:45:48,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1729610.0, ans=0.125 2024-08-12 16:45:59,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1729710.0, ans=0.125 2024-08-12 16:46:04,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1729710.0, ans=0.125 2024-08-12 16:46:17,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.98 vs. limit=6.0 2024-08-12 16:46:41,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-12 16:46:52,267 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 16:47:05,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13600, loss[loss=0.09473, beats_loss=0.01169, ecapa_loss=0.0001828, whisper_loss=0.08121, over 14417.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01104, ecapa_loss=0.0001717, whisper_loss=0.09128, over 3881450.76 frames. ], batch size: 58, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:47:12,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1730110.0, ans=0.5 2024-08-12 16:47:31,035 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.482e+01 2.733e+01 3.104e+01 2.478e+02, threshold=5.467e+01, percent-clipped=1.0 2024-08-12 16:47:47,983 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 16:47:54,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=22.5 2024-08-12 16:47:59,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1730410.0, ans=0.0 2024-08-12 16:48:03,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1730410.0, ans=0.125 2024-08-12 16:48:04,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1730410.0, ans=0.0 2024-08-12 16:48:04,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1730410.0, ans=0.125 2024-08-12 16:48:04,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1730410.0, ans=0.0 2024-08-12 16:48:06,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1730410.0, ans=0.2 2024-08-12 16:48:31,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13650, loss[loss=0.08848, beats_loss=0.01398, ecapa_loss=0.0001734, whisper_loss=0.07277, over 14488.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01106, ecapa_loss=0.0001728, whisper_loss=0.09128, over 3863389.02 frames. ], batch size: 59, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:48:33,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-12 16:48:33,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1730610.0, ans=15.0 2024-08-12 16:48:35,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1730610.0, ans=0.2 2024-08-12 16:48:36,845 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 16:48:37,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=1730610.0, ans=0.02 2024-08-12 16:48:38,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-12 16:48:47,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2024-08-12 16:48:53,977 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 16 from LS+wenet, 24 from Vox, 54 fro AS 2024-08-12 16:49:26,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1730910.0, ans=0.125 2024-08-12 16:49:34,882 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 16:49:43,976 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 16:50:06,984 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13700, loss[loss=0.08816, beats_loss=0.01055, ecapa_loss=0.0001634, whisper_loss=0.07597, over 13505.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01109, ecapa_loss=0.0001744, whisper_loss=0.09131, over 3854902.93 frames. ], batch size: 55, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:50:10,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-08-12 16:50:16,835 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 16:50:34,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.487e+01 2.754e+01 3.214e+01 5.264e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 16:50:43,955 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 16:51:04,301 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 16:51:16,556 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.260e+05 2024-08-12 16:51:25,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=12.0 2024-08-12 16:51:26,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1731510.0, ans=0.125 2024-08-12 16:51:33,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13750, loss[loss=0.1309, beats_loss=0.008139, ecapa_loss=0.0002115, whisper_loss=0.1206, over 20031.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001742, whisper_loss=0.09187, over 3859018.98 frames. ], batch size: 81, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:51:37,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1731610.0, ans=0.125 2024-08-12 16:51:52,644 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 16:52:01,297 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 16:52:06,245 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-12 16:52:11,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=8.0 2024-08-12 16:52:14,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1731810.0, ans=0.125 2024-08-12 16:52:16,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1731810.0, ans=0.2 2024-08-12 16:52:25,074 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 16:52:45,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1732010.0, ans=0.0 2024-08-12 16:52:58,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1732110.0, ans=0.125 2024-08-12 16:52:59,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13800, loss[loss=0.1414, beats_loss=0.008861, ecapa_loss=0.0001892, whisper_loss=0.1307, over 23553.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01107, ecapa_loss=0.0001752, whisper_loss=0.09183, over 3850851.06 frames. ], batch size: 91, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:53:01,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1732110.0, ans=0.125 2024-08-12 16:53:10,981 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 16:53:17,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2024-08-12 16:53:24,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.542e+01 2.940e+01 3.312e+01 1.437e+02, threshold=5.879e+01, percent-clipped=2.0 2024-08-12 16:53:30,586 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 28 from LS+wenet, 39 from Vox, 29 fro AS 2024-08-12 16:53:32,161 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 16:53:33,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.04 vs. limit=22.5 2024-08-12 16:53:39,265 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 16:53:54,798 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 16:54:09,166 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 16:54:21,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1732510.0, ans=0.125 2024-08-12 16:54:25,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1732510.0, ans=0.125 2024-08-12 16:54:28,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13850, loss[loss=0.1135, beats_loss=0.01257, ecapa_loss=0.0001457, whisper_loss=0.09945, over 22597.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001739, whisper_loss=0.09196, over 3891041.66 frames. ], batch size: 86, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:54:36,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1732610.0, ans=0.125 2024-08-12 16:54:49,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1732710.0, ans=0.0 2024-08-12 16:55:11,100 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 16:55:24,895 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 16:55:26,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1732910.0, ans=0.0 2024-08-12 16:55:52,758 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 16:55:59,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13900, loss[loss=0.1078, beats_loss=0.01095, ecapa_loss=0.000166, whisper_loss=0.09518, over 23467.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01114, ecapa_loss=0.0001745, whisper_loss=0.09161, over 3874550.12 frames. ], batch size: 91, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:56:03,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1733110.0, ans=0.125 2024-08-12 16:56:05,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1733110.0, ans=0.125 2024-08-12 16:56:19,820 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 16:56:25,159 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.084e+02 2024-08-12 16:56:25,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1733210.0, ans=0.0 2024-08-12 16:56:25,958 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.635e+01 2.870e+01 3.246e+01 6.120e+01, threshold=5.740e+01, percent-clipped=1.0 2024-08-12 16:56:26,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1733210.0, ans=0.125 2024-08-12 16:56:43,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1733310.0, ans=0.0 2024-08-12 16:56:50,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-08-12 16:56:52,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1733410.0, ans=0.0 2024-08-12 16:56:59,992 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 16:57:08,184 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 16:57:21,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 13950, loss[loss=0.1142, beats_loss=0.01161, ecapa_loss=0.0001804, whisper_loss=0.1007, over 21739.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01107, ecapa_loss=0.0001744, whisper_loss=0.09209, over 3886379.67 frames. ], batch size: 88, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:57:22,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1733610.0, ans=0.0 2024-08-12 16:57:24,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-12 16:57:34,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1733610.0, ans=0.1 2024-08-12 16:58:05,266 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-12 16:58:26,843 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 9 from Vox, 42 fro AS 2024-08-12 16:58:44,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14000, loss[loss=0.1052, beats_loss=0.00739, ecapa_loss=0.0002754, whisper_loss=0.09503, over 21128.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001736, whisper_loss=0.09256, over 3882150.69 frames. ], batch size: 93, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:58:55,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1734110.0, ans=0.1 2024-08-12 16:58:57,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1734110.0, ans=0.125 2024-08-12 16:59:04,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1734210.0, ans=0.125 2024-08-12 16:59:06,464 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 16:59:09,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.481e+01 2.768e+01 3.199e+01 7.750e+01, threshold=5.536e+01, percent-clipped=1.0 2024-08-12 16:59:11,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1734210.0, ans=15.0 2024-08-12 16:59:28,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1734310.0, ans=0.125 2024-08-12 16:59:47,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1734410.0, ans=0.2 2024-08-12 16:59:51,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1734410.0, ans=0.125 2024-08-12 17:00:04,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2024-08-12 17:00:14,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14050, loss[loss=0.0917, beats_loss=0.01101, ecapa_loss=0.0001887, whisper_loss=0.07881, over 19656.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01109, ecapa_loss=0.0001724, whisper_loss=0.09206, over 3919843.50 frames. ], batch size: 80, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:00:17,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-12 17:00:26,755 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 17:00:50,235 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 17:01:41,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14100, loss[loss=0.1142, beats_loss=0.009608, ecapa_loss=0.000176, whisper_loss=0.1029, over 16287.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01105, ecapa_loss=0.000172, whisper_loss=0.09242, over 3880156.78 frames. ], batch size: 65, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:02:10,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.510e+01 2.862e+01 3.257e+01 4.688e+01, threshold=5.723e+01, percent-clipped=0.0 2024-08-12 17:02:42,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1735410.0, ans=0.125 2024-08-12 17:03:03,328 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-12 17:03:10,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14150, loss[loss=0.09241, beats_loss=0.008401, ecapa_loss=0.0001616, whisper_loss=0.0824, over 14853.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001736, whisper_loss=0.092, over 3878780.58 frames. ], batch size: 55, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:03:11,040 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 17:03:33,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2024-08-12 17:03:37,166 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 17:03:39,487 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-12 17:03:43,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1735710.0, ans=0.125 2024-08-12 17:04:06,047 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 28 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-12 17:04:20,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1735910.0, ans=0.0 2024-08-12 17:04:38,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1736010.0, ans=0.125 2024-08-12 17:04:39,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1736010.0, ans=0.0 2024-08-12 17:04:45,108 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 17:04:48,710 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 17:04:50,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14200, loss[loss=0.09486, beats_loss=0.01177, ecapa_loss=0.0001134, whisper_loss=0.08196, over 16590.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001739, whisper_loss=0.092, over 3883287.64 frames. ], batch size: 62, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:04:58,287 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 17:05:02,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1736110.0, ans=0.125 2024-08-12 17:05:10,139 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.019e-01 2024-08-12 17:05:11,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1736210.0, ans=0.125 2024-08-12 17:05:14,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.568e+01 2.822e+01 3.210e+01 8.568e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 17:05:37,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1736410.0, ans=0.125 2024-08-12 17:05:41,205 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 17:05:58,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1736510.0, ans=0.125 2024-08-12 17:06:06,277 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 17:06:10,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14250, loss[loss=0.0855, beats_loss=0.01354, ecapa_loss=0.0001277, whisper_loss=0.07068, over 19613.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01103, ecapa_loss=0.0001734, whisper_loss=0.09169, over 3874330.34 frames. ], batch size: 77, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:06:13,244 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 17:06:16,898 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 17:06:22,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1736610.0, ans=0.125 2024-08-12 17:06:25,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1736610.0, ans=0.0 2024-08-12 17:06:57,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1736810.0, ans=0.1 2024-08-12 17:07:06,750 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 17:07:18,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-12 17:07:25,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1737010.0, ans=0.025 2024-08-12 17:07:27,333 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 17:07:31,205 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 17:07:44,008 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14300, loss[loss=0.09727, beats_loss=0.01111, ecapa_loss=0.0001507, whisper_loss=0.08465, over 23217.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01108, ecapa_loss=0.0001715, whisper_loss=0.09181, over 3921877.94 frames. ], batch size: 93, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:08:04,394 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 17:08:10,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.619e+01 2.822e+01 3.259e+01 8.695e+01, threshold=5.643e+01, percent-clipped=1.0 2024-08-12 17:08:18,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1737310.0, ans=0.125 2024-08-12 17:08:30,170 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 17:08:47,397 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-12 17:08:56,219 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 17:08:57,745 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 17:09:07,787 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 17:09:11,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14350, loss[loss=0.09973, beats_loss=0.01009, ecapa_loss=0.0001719, whisper_loss=0.08793, over 18880.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001716, whisper_loss=0.0916, over 3905350.17 frames. ], batch size: 74, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:09:36,376 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 19 from Vox, 13 fro AS 2024-08-12 17:09:43,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2024-08-12 17:09:47,896 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 17:10:50,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1737910.0, ans=0.125 2024-08-12 17:11:00,470 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.472e-01 2024-08-12 17:11:05,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1738010.0, ans=0.2 2024-08-12 17:11:13,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14400, loss[loss=0.08819, beats_loss=0.01307, ecapa_loss=0.0001839, whisper_loss=0.07329, over 18438.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001734, whisper_loss=0.09217, over 3916551.80 frames. ], batch size: 79, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:11:18,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1738110.0, ans=0.125 2024-08-12 17:11:24,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1738110.0, ans=0.09899494936611666 2024-08-12 17:11:28,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1738110.0, ans=0.0 2024-08-12 17:11:44,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.468e+01 2.751e+01 3.183e+01 4.709e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-12 17:11:52,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-12 17:12:02,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-12 17:12:17,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1738410.0, ans=0.125 2024-08-12 17:12:22,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1738410.0, ans=0.125 2024-08-12 17:12:32,629 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 17:12:35,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1738510.0, ans=0.0 2024-08-12 17:12:47,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1738510.0, ans=0.1 2024-08-12 17:12:52,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 12, batch 14450, loss[loss=0.1065, beats_loss=0.01005, ecapa_loss=0.0002212, whisper_loss=0.09421, over 21116.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001747, whisper_loss=0.09194, over 3926497.43 frames. ], batch size: 91, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:13:04,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-08-12 17:13:16,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-12 17:13:26,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1738810.0, ans=0.1 2024-08-12 17:13:29,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1738810.0, ans=0.2 2024-08-12 17:13:35,454 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 17:13:54,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1738910.0, ans=0.2 2024-08-12 17:15:01,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 0, loss[loss=0.105, beats_loss=0.01125, ecapa_loss=0.0001861, whisper_loss=0.09187, over 23852.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01125, ecapa_loss=0.0001861, whisper_loss=0.09187, over 23852.00 frames. ], batch size: 94, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:15:01,344 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 17:15:45,063 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on ASR_libri: loss=0.255, beats_loss=0, ecapa_loss=0.0005844, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 17:16:01,495 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on SV_voxceleb1: loss=0.004777, beats_loss=0, ecapa_loss=0.0004777, whisper_loss=0, over 939242.00 frames. 2024-08-12 17:18:04,536 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on AT_audioset: loss=0.02416, beats_loss=0.02416, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 17:18:04,539 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 17:18:12,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1739080.0, ans=0.0 2024-08-12 17:18:21,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1739080.0, ans=0.125 2024-08-12 17:18:34,753 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 17:18:55,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.525e+01 2.835e+01 3.382e+01 8.605e+01, threshold=5.671e+01, percent-clipped=1.0 2024-08-12 17:18:57,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-12 17:19:46,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1739380.0, ans=0.05 2024-08-12 17:20:18,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 50, loss[loss=0.07468, beats_loss=0.01089, ecapa_loss=0.0001988, whisper_loss=0.0618, over 19947.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01045, ecapa_loss=0.0001762, whisper_loss=0.09214, over 883583.27 frames. ], batch size: 80, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:20:38,674 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 17:20:56,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1739680.0, ans=0.0 2024-08-12 17:21:03,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1739680.0, ans=0.125 2024-08-12 17:21:15,541 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 17:22:05,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1739980.0, ans=0.125 2024-08-12 17:22:20,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 100, loss[loss=0.08058, beats_loss=0.01457, ecapa_loss=0.0001931, whisper_loss=0.06407, over 15924.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.000178, whisper_loss=0.08935, over 1556499.99 frames. ], batch size: 69, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:22:26,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1740080.0, ans=0.0 2024-08-12 17:22:31,048 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 17:22:38,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1740080.0, ans=0.0 2024-08-12 17:22:56,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1740180.0, ans=0.125 2024-08-12 17:22:56,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1740180.0, ans=0.07 2024-08-12 17:23:04,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1740180.0, ans=0.0 2024-08-12 17:23:05,596 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.865e+01 3.060e+01 3.356e+01 6.213e+01, threshold=6.120e+01, percent-clipped=1.0 2024-08-12 17:23:15,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-12 17:23:21,457 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-12 17:23:33,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-12 17:23:43,712 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-12 17:23:54,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1740480.0, ans=0.09899494936611666 2024-08-12 17:23:58,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1740480.0, ans=0.1 2024-08-12 17:24:05,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1740480.0, ans=0.07 2024-08-12 17:24:15,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 150, loss[loss=0.08736, beats_loss=0.01158, ecapa_loss=0.0001979, whisper_loss=0.0738, over 18958.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.000174, whisper_loss=0.09046, over 2087879.48 frames. ], batch size: 81, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:25:03,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1740780.0, ans=0.1 2024-08-12 17:25:09,829 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 17:25:32,985 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 17:25:42,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 200, loss[loss=0.09595, beats_loss=0.0118, ecapa_loss=0.0001845, whisper_loss=0.08231, over 18205.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0001756, whisper_loss=0.09172, over 2474019.91 frames. ], batch size: 73, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:25:49,055 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 17:25:54,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1741080.0, ans=0.0 2024-08-12 17:26:09,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1741180.0, ans=0.2 2024-08-12 17:26:09,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1741180.0, ans=0.125 2024-08-12 17:26:09,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2024-08-12 17:26:11,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.594e+01 3.008e+01 3.381e+01 4.307e+01, threshold=6.015e+01, percent-clipped=0.0 2024-08-12 17:26:32,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.44 vs. limit=22.5 2024-08-12 17:26:40,925 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 17:27:00,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 250, loss[loss=0.1131, beats_loss=0.009525, ecapa_loss=0.0001808, whisper_loss=0.1017, over 14994.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001742, whisper_loss=0.09178, over 2782106.33 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:27:12,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-12 17:27:14,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1741580.0, ans=10.0 2024-08-12 17:27:15,064 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 17:27:17,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2024-08-12 17:27:39,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1741780.0, ans=0.125 2024-08-12 17:27:51,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-12 17:27:52,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1741880.0, ans=0.1 2024-08-12 17:28:03,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1741980.0, ans=0.0 2024-08-12 17:28:07,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1741980.0, ans=0.1 2024-08-12 17:28:07,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-12 17:28:17,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1742080.0, ans=0.125 2024-08-12 17:28:17,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 300, loss[loss=0.1218, beats_loss=0.01038, ecapa_loss=0.0001686, whisper_loss=0.1097, over 14501.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001733, whisper_loss=0.09152, over 2985734.93 frames. ], batch size: 55, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:28:20,561 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 17:28:27,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1742080.0, ans=0.0 2024-08-12 17:28:38,437 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 17:28:44,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.349e+01 2.732e+01 3.113e+01 6.634e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-12 17:29:09,249 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.055e+01 2024-08-12 17:29:16,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1742480.0, ans=0.125 2024-08-12 17:29:21,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1742480.0, ans=0.1 2024-08-12 17:29:32,754 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 350, loss[loss=0.08635, beats_loss=0.009414, ecapa_loss=0.0002274, whisper_loss=0.07466, over 14726.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001729, whisper_loss=0.0918, over 3172384.93 frames. ], batch size: 59, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:29:34,177 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 17:29:42,770 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 17:29:45,998 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.089e+05 2024-08-12 17:30:08,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2024-08-12 17:30:34,880 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 17:30:44,518 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 400, loss[loss=0.09195, beats_loss=0.01149, ecapa_loss=0.000144, whisper_loss=0.07902, over 21512.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001734, whisper_loss=0.09141, over 3314348.01 frames. ], batch size: 85, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:30:45,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1743080.0, ans=0.2 2024-08-12 17:30:52,483 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 17:30:54,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1743080.0, ans=0.1 2024-08-12 17:30:54,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1743080.0, ans=0.125 2024-08-12 17:30:57,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-12 17:30:58,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1743180.0, ans=0.09899494936611666 2024-08-12 17:31:01,064 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 17:31:01,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1743180.0, ans=10.0 2024-08-12 17:31:10,786 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.525e+01 2.765e+01 3.244e+01 1.385e+02, threshold=5.529e+01, percent-clipped=2.0 2024-08-12 17:31:30,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1743380.0, ans=0.1 2024-08-12 17:31:53,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1743480.0, ans=0.125 2024-08-12 17:31:58,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 450, loss[loss=0.09908, beats_loss=0.01262, ecapa_loss=0.0001351, whisper_loss=0.0851, over 23283.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001733, whisper_loss=0.09157, over 3447954.72 frames. ], batch size: 92, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:31:58,428 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 17:32:10,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.78 vs. limit=10.0 2024-08-12 17:32:14,331 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 17:32:43,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2024-08-12 17:32:50,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1743880.0, ans=0.125 2024-08-12 17:32:56,854 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 17:33:02,953 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 17:33:04,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1743980.0, ans=0.125 2024-08-12 17:33:04,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1743980.0, ans=0.125 2024-08-12 17:33:10,216 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 17:33:11,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 500, loss[loss=0.09575, beats_loss=0.01206, ecapa_loss=0.0001921, whisper_loss=0.08177, over 21656.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01081, ecapa_loss=0.0001728, whisper_loss=0.09135, over 3534799.50 frames. ], batch size: 92, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:33:19,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1744080.0, ans=0.125 2024-08-12 17:33:35,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1744180.0, ans=0.04949747468305833 2024-08-12 17:33:40,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.531e+01 2.780e+01 3.170e+01 4.119e+01, threshold=5.561e+01, percent-clipped=0.0 2024-08-12 17:33:51,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2024-08-12 17:33:52,189 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 17:33:57,138 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 17:33:57,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1744280.0, ans=0.1 2024-08-12 17:34:01,985 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-12 17:34:17,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1744480.0, ans=0.1 2024-08-12 17:34:20,345 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 17:34:24,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1744480.0, ans=0.0 2024-08-12 17:34:30,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 550, loss[loss=0.1116, beats_loss=0.009694, ecapa_loss=0.000186, whisper_loss=0.1001, over 18595.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01073, ecapa_loss=0.000173, whisper_loss=0.09217, over 3617698.93 frames. ], batch size: 76, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:34:32,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-12 17:34:58,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1744680.0, ans=0.015 2024-08-12 17:35:03,699 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 17:35:05,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1744780.0, ans=0.125 2024-08-12 17:35:08,212 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 17:35:08,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1744780.0, ans=0.0 2024-08-12 17:35:29,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1744980.0, ans=0.1 2024-08-12 17:35:45,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 600, loss[loss=0.1048, beats_loss=0.01054, ecapa_loss=0.0001929, whisper_loss=0.09235, over 17632.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01068, ecapa_loss=0.0001736, whisper_loss=0.09229, over 3682404.21 frames. ], batch size: 71, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:35:52,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1745080.0, ans=0.125 2024-08-12 17:35:55,445 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 17:36:11,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.517e+01 2.834e+01 3.150e+01 6.498e+01, threshold=5.667e+01, percent-clipped=2.0 2024-08-12 17:36:26,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=8.0 2024-08-12 17:36:41,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=12.0 2024-08-12 17:36:49,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1745480.0, ans=0.125 2024-08-12 17:36:57,712 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 650, loss[loss=0.09502, beats_loss=0.0131, ecapa_loss=0.0001762, whisper_loss=0.08016, over 16642.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001725, whisper_loss=0.09192, over 3700095.80 frames. ], batch size: 68, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:37:10,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2024-08-12 17:37:14,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=1745680.0, ans=12.0 2024-08-12 17:37:19,112 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 17:37:20,292 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-12 17:37:24,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1745680.0, ans=0.125 2024-08-12 17:37:33,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1745780.0, ans=0.125 2024-08-12 17:37:47,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1745880.0, ans=0.05 2024-08-12 17:38:00,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1745980.0, ans=0.0 2024-08-12 17:38:10,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 700, loss[loss=0.101, beats_loss=0.01343, ecapa_loss=0.0001282, whisper_loss=0.08628, over 23609.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01074, ecapa_loss=0.0001714, whisper_loss=0.09176, over 3733699.83 frames. ], batch size: 93, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:38:13,870 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 17:38:27,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1746180.0, ans=0.125 2024-08-12 17:38:37,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.446e+01 2.651e+01 3.040e+01 5.006e+01, threshold=5.302e+01, percent-clipped=0.0 2024-08-12 17:38:47,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1746280.0, ans=0.125 2024-08-12 17:38:50,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2024-08-12 17:38:51,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1746280.0, ans=0.1 2024-08-12 17:39:00,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1746380.0, ans=0.0 2024-08-12 17:39:16,975 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-12 17:39:24,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 750, loss[loss=0.09351, beats_loss=0.009112, ecapa_loss=0.0001821, whisper_loss=0.08258, over 18321.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001694, whisper_loss=0.09135, over 3784997.88 frames. ], batch size: 72, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:39:55,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1746780.0, ans=0.035 2024-08-12 17:40:22,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1746980.0, ans=0.125 2024-08-12 17:40:29,936 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 26 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-12 17:40:32,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-12 17:40:33,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1746980.0, ans=0.0 2024-08-12 17:40:36,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1747080.0, ans=0.0 2024-08-12 17:40:36,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-12 17:40:37,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 800, loss[loss=0.109, beats_loss=0.01137, ecapa_loss=0.0001588, whisper_loss=0.096, over 13922.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001696, whisper_loss=0.09149, over 3801701.01 frames. ], batch size: 54, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:40:42,816 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 17:40:51,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-12 17:40:54,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1747180.0, ans=0.0 2024-08-12 17:41:03,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.398e+01 2.726e+01 3.050e+01 4.286e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 17:41:21,389 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 17:41:24,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1747380.0, ans=0.125 2024-08-12 17:41:41,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1747480.0, ans=0.2 2024-08-12 17:41:41,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=12.0 2024-08-12 17:41:44,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-12 17:41:51,283 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 850, loss[loss=0.08062, beats_loss=0.01178, ecapa_loss=0.0002027, whisper_loss=0.06681, over 15914.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001696, whisper_loss=0.09075, over 3794755.39 frames. ], batch size: 66, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:41:52,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1747580.0, ans=0.125 2024-08-12 17:41:53,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1747580.0, ans=0.2 2024-08-12 17:41:59,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1747580.0, ans=0.2 2024-08-12 17:42:04,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1747580.0, ans=0.125 2024-08-12 17:42:08,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2024-08-12 17:42:22,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1747780.0, ans=0.125 2024-08-12 17:42:31,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1747780.0, ans=0.0 2024-08-12 17:42:37,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1747880.0, ans=0.125 2024-08-12 17:42:39,832 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 17:42:40,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.66 vs. limit=5.0 2024-08-12 17:43:04,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-12 17:43:05,139 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 17:43:06,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 900, loss[loss=0.1118, beats_loss=0.01206, ecapa_loss=0.0001368, whisper_loss=0.09836, over 23962.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01081, ecapa_loss=0.0001675, whisper_loss=0.09087, over 3781909.54 frames. ], batch size: 92, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:43:32,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.406e+01 2.653e+01 2.914e+01 6.572e+01, threshold=5.306e+01, percent-clipped=1.0 2024-08-12 17:43:46,708 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 17:43:48,214 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 17:44:06,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1748480.0, ans=0.125 2024-08-12 17:44:17,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 950, loss[loss=0.1125, beats_loss=0.01176, ecapa_loss=0.0001355, whisper_loss=0.09942, over 18593.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01084, ecapa_loss=0.0001676, whisper_loss=0.09032, over 3774010.61 frames. ], batch size: 70, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:44:23,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1748580.0, ans=0.0 2024-08-12 17:44:34,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1748680.0, ans=0.125 2024-08-12 17:44:36,636 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 17:44:49,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1748780.0, ans=0.0 2024-08-12 17:45:11,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1748880.0, ans=0.125 2024-08-12 17:45:27,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1000, loss[loss=0.1102, beats_loss=0.01102, ecapa_loss=0.0001807, whisper_loss=0.0974, over 17656.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001689, whisper_loss=0.09084, over 3793637.81 frames. ], batch size: 71, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:45:28,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1749080.0, ans=0.0 2024-08-12 17:45:28,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=1749080.0, ans=15.0 2024-08-12 17:45:53,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.479e+01 2.731e+01 3.171e+01 4.511e+01, threshold=5.462e+01, percent-clipped=0.0 2024-08-12 17:45:56,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1749280.0, ans=0.0 2024-08-12 17:45:56,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.32 vs. limit=12.0 2024-08-12 17:46:07,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1749280.0, ans=0.1 2024-08-12 17:46:10,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1749280.0, ans=0.125 2024-08-12 17:46:20,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1749380.0, ans=0.0 2024-08-12 17:46:33,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-08-12 17:46:41,737 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1050, loss[loss=0.1018, beats_loss=0.01008, ecapa_loss=0.0001774, whisper_loss=0.08999, over 18634.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.000169, whisper_loss=0.09042, over 3794997.23 frames. ], batch size: 73, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:47:01,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1749680.0, ans=0.0 2024-08-12 17:47:04,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1749680.0, ans=0.125 2024-08-12 17:47:12,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1749780.0, ans=0.0 2024-08-12 17:47:30,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1749880.0, ans=0.125 2024-08-12 17:47:37,451 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 17:47:44,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1749980.0, ans=0.125 2024-08-12 17:47:44,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1749980.0, ans=0.2 2024-08-12 17:47:57,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1100, loss[loss=0.1039, beats_loss=0.01169, ecapa_loss=0.0001568, whisper_loss=0.09066, over 17285.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001697, whisper_loss=0.0907, over 3784791.13 frames. ], batch size: 68, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:47:58,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2024-08-12 17:47:59,324 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 17:48:01,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-12 17:48:24,786 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.582e+01 2.825e+01 3.154e+01 4.424e+01, threshold=5.651e+01, percent-clipped=0.0 2024-08-12 17:48:28,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1750280.0, ans=0.1 2024-08-12 17:48:33,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1750280.0, ans=0.125 2024-08-12 17:48:57,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1750380.0, ans=0.125 2024-08-12 17:49:12,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1750480.0, ans=0.0 2024-08-12 17:49:19,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-12 17:49:21,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1750580.0, ans=0.015 2024-08-12 17:49:23,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1150, loss[loss=0.1091, beats_loss=0.009629, ecapa_loss=0.0001581, whisper_loss=0.0979, over 19930.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001679, whisper_loss=0.09073, over 3783098.26 frames. ], batch size: 77, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:49:24,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1750580.0, ans=0.95 2024-08-12 17:49:49,313 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 17:50:11,442 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 17:50:15,828 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 17:50:21,998 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 17:50:27,573 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.377e+05 2024-08-12 17:50:51,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1200, loss[loss=0.09846, beats_loss=0.01162, ecapa_loss=0.0001792, whisper_loss=0.08505, over 22777.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001698, whisper_loss=0.0909, over 3789366.16 frames. ], batch size: 91, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:50:59,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1751080.0, ans=0.2 2024-08-12 17:51:01,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1751080.0, ans=0.07 2024-08-12 17:51:06,784 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:51:27,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.378e+01 2.599e+01 3.054e+01 4.994e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-12 17:51:54,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1751380.0, ans=0.125 2024-08-12 17:52:02,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-12 17:52:31,778 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 17:52:37,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1250, loss[loss=0.1176, beats_loss=0.01064, ecapa_loss=0.000163, whisper_loss=0.1054, over 22891.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01084, ecapa_loss=0.0001693, whisper_loss=0.0903, over 3800012.78 frames. ], batch size: 91, lr: 4.93e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:52:39,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1751580.0, ans=0.125 2024-08-12 17:53:05,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1751680.0, ans=0.0 2024-08-12 17:53:18,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1751680.0, ans=0.125 2024-08-12 17:53:21,345 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 17:53:22,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1751780.0, ans=0.125 2024-08-12 17:53:24,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1751780.0, ans=0.2 2024-08-12 17:53:34,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1751780.0, ans=0.2 2024-08-12 17:53:36,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1751780.0, ans=0.125 2024-08-12 17:53:36,421 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:53:39,561 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 17:53:41,130 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 17:53:49,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1751880.0, ans=0.2 2024-08-12 17:53:56,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1751880.0, ans=0.125 2024-08-12 17:54:17,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1751980.0, ans=0.0 2024-08-12 17:54:27,512 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1300, loss[loss=0.1166, beats_loss=0.01107, ecapa_loss=0.0001512, whisper_loss=0.104, over 23425.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001677, whisper_loss=0.09089, over 3834200.46 frames. ], batch size: 89, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:54:31,249 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 17:54:57,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-12 17:55:06,907 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.406e+01 2.650e+01 2.964e+01 4.612e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-12 17:55:14,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1752280.0, ans=0.1 2024-08-12 17:55:17,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1752280.0, ans=0.05 2024-08-12 17:55:19,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1752280.0, ans=0.125 2024-08-12 17:55:24,672 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 17:55:53,485 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 17:56:14,966 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1350, loss[loss=0.1089, beats_loss=0.008932, ecapa_loss=0.0001747, whisper_loss=0.09821, over 16753.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0109, ecapa_loss=0.0001669, whisper_loss=0.09039, over 3861054.99 frames. ], batch size: 65, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:56:21,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-12 17:56:31,082 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 17:56:39,494 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 17:57:03,907 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 7 from Vox, 27 fro AS 2024-08-12 17:57:25,105 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 17:57:31,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1752980.0, ans=0.035 2024-08-12 17:57:35,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1752980.0, ans=0.2 2024-08-12 17:57:38,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1400, loss[loss=0.114, beats_loss=0.01425, ecapa_loss=0.0001772, whisper_loss=0.09801, over 17662.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001669, whisper_loss=0.09126, over 3841183.59 frames. ], batch size: 71, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:57:38,265 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 17:57:51,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1753180.0, ans=0.04949747468305833 2024-08-12 17:57:56,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1753180.0, ans=0.125 2024-08-12 17:58:00,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1753180.0, ans=0.125 2024-08-12 17:58:04,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.419e+01 2.702e+01 3.143e+01 2.017e+02, threshold=5.404e+01, percent-clipped=3.0 2024-08-12 17:58:32,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1753380.0, ans=0.0 2024-08-12 17:58:40,409 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 17:59:02,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1450, loss[loss=0.09144, beats_loss=0.009821, ecapa_loss=0.000174, whisper_loss=0.07988, over 14278.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001675, whisper_loss=0.09108, over 3805789.68 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:59:15,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1753580.0, ans=0.0 2024-08-12 17:59:20,359 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 17:59:48,685 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 17:59:50,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1753880.0, ans=0.0 2024-08-12 18:00:10,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1753980.0, ans=0.1 2024-08-12 18:00:20,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-08-12 18:00:21,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1500, loss[loss=0.08981, beats_loss=0.009992, ecapa_loss=0.0001811, whisper_loss=0.07801, over 13829.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01083, ecapa_loss=0.0001665, whisper_loss=0.08993, over 3800310.58 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:00:32,033 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 18:00:35,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1754080.0, ans=0.0 2024-08-12 18:00:35,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1754080.0, ans=0.0 2024-08-12 18:00:50,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.459e+01 2.780e+01 3.185e+01 5.902e+01, threshold=5.561e+01, percent-clipped=1.0 2024-08-12 18:00:51,149 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 18:01:15,111 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 18:01:40,009 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 18:01:41,427 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1550, loss[loss=0.1101, beats_loss=0.0109, ecapa_loss=0.0001493, whisper_loss=0.09767, over 19340.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001663, whisper_loss=0.09076, over 3822404.94 frames. ], batch size: 77, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:02:12,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1754780.0, ans=0.125 2024-08-12 18:02:23,225 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 18:02:44,945 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 18:02:58,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1600, loss[loss=0.106, beats_loss=0.01025, ecapa_loss=0.0001915, whisper_loss=0.09388, over 22690.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001653, whisper_loss=0.09053, over 3790099.41 frames. ], batch size: 91, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:03:13,342 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-12 18:03:21,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1755180.0, ans=10.0 2024-08-12 18:03:23,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1755180.0, ans=0.1 2024-08-12 18:03:25,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.499e+01 2.878e+01 3.295e+01 8.050e+01, threshold=5.757e+01, percent-clipped=1.0 2024-08-12 18:03:35,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-12 18:03:41,576 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 18:04:14,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1650, loss[loss=0.08079, beats_loss=0.01254, ecapa_loss=0.000186, whisper_loss=0.06639, over 20880.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001659, whisper_loss=0.09062, over 3799187.17 frames. ], batch size: 89, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:04:15,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-12 18:04:20,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1755580.0, ans=0.0 2024-08-12 18:04:28,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=15.0 2024-08-12 18:04:34,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.07 vs. limit=10.0 2024-08-12 18:04:37,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1755680.0, ans=0.125 2024-08-12 18:04:43,365 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 18:04:51,273 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-12 18:04:53,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1755780.0, ans=0.125 2024-08-12 18:04:58,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1755880.0, ans=0.125 2024-08-12 18:05:29,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1700, loss[loss=0.1119, beats_loss=0.008578, ecapa_loss=0.0001553, whisper_loss=0.1018, over 18901.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001653, whisper_loss=0.09083, over 3819969.04 frames. ], batch size: 71, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:05:37,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-12 18:05:50,388 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 18:05:56,274 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.398e+01 2.715e+01 2.937e+01 4.103e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 18:06:13,372 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 18:06:25,013 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 18:06:27,924 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 18:06:42,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1750, loss[loss=0.1064, beats_loss=0.0113, ecapa_loss=0.0001474, whisper_loss=0.09362, over 17439.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01087, ecapa_loss=0.0001649, whisper_loss=0.09062, over 3828426.33 frames. ], batch size: 69, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:06:56,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1756680.0, ans=22.5 2024-08-12 18:07:17,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1756780.0, ans=0.1 2024-08-12 18:07:20,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1756780.0, ans=0.0 2024-08-12 18:07:42,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1756980.0, ans=0.025 2024-08-12 18:07:43,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1756980.0, ans=0.1 2024-08-12 18:07:51,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1756980.0, ans=0.125 2024-08-12 18:07:55,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1800, loss[loss=0.1272, beats_loss=0.01064, ecapa_loss=0.0001715, whisper_loss=0.1149, over 19476.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001644, whisper_loss=0.09081, over 3854440.60 frames. ], batch size: 76, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:08:21,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.466e+01 2.734e+01 3.019e+01 6.645e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 18:08:29,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1757280.0, ans=0.0 2024-08-12 18:08:33,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1757280.0, ans=0.125 2024-08-12 18:08:33,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1757280.0, ans=0.2 2024-08-12 18:08:34,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1757280.0, ans=0.125 2024-08-12 18:08:37,004 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 18:08:43,684 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-12 18:08:45,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1757380.0, ans=0.0 2024-08-12 18:08:47,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=15.0 2024-08-12 18:08:55,168 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 18:08:55,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1757480.0, ans=0.1 2024-08-12 18:09:08,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1850, loss[loss=0.09634, beats_loss=0.01313, ecapa_loss=0.0001196, whisper_loss=0.08202, over 18293.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01081, ecapa_loss=0.0001659, whisper_loss=0.09087, over 3842282.79 frames. ], batch size: 70, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:09:09,097 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 18:09:27,938 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 18:09:37,786 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 18:09:42,341 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 18:09:52,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1757880.0, ans=0.1 2024-08-12 18:09:58,118 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 18:09:59,420 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 18:10:09,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1757980.0, ans=0.1 2024-08-12 18:10:12,968 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 18:10:20,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1758080.0, ans=0.1 2024-08-12 18:10:20,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1900, loss[loss=0.122, beats_loss=0.009613, ecapa_loss=0.0001486, whisper_loss=0.1109, over 18223.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001677, whisper_loss=0.09156, over 3827801.11 frames. ], batch size: 70, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:10:21,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1758080.0, ans=0.2 2024-08-12 18:10:21,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2024-08-12 18:10:22,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1758080.0, ans=0.2 2024-08-12 18:10:25,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1758080.0, ans=0.1 2024-08-12 18:10:47,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.395e+01 2.725e+01 3.038e+01 6.504e+01, threshold=5.449e+01, percent-clipped=3.0 2024-08-12 18:10:50,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1758280.0, ans=0.04949747468305833 2024-08-12 18:10:52,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1758280.0, ans=0.1 2024-08-12 18:10:57,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1758280.0, ans=0.125 2024-08-12 18:11:04,940 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 18:11:11,842 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 18:11:34,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 1950, loss[loss=0.07996, beats_loss=0.01207, ecapa_loss=0.0001598, whisper_loss=0.06629, over 14486.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.000169, whisper_loss=0.09106, over 3796822.42 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:11:34,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1758580.0, ans=0.125 2024-08-12 18:11:47,163 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 18:12:26,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1758880.0, ans=0.2 2024-08-12 18:12:42,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1758980.0, ans=0.0 2024-08-12 18:12:48,056 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2000, loss[loss=0.1324, beats_loss=0.009192, ecapa_loss=0.0001936, whisper_loss=0.1213, over 19881.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001696, whisper_loss=0.09122, over 3817766.34 frames. ], batch size: 78, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:13:15,377 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.512e+01 2.812e+01 3.299e+01 5.299e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 18:13:17,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1759280.0, ans=0.125 2024-08-12 18:13:18,021 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 18:13:30,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2024-08-12 18:13:50,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1759480.0, ans=0.1 2024-08-12 18:13:58,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1759480.0, ans=0.125 2024-08-12 18:13:58,896 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 18:14:00,477 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 18:14:01,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1759580.0, ans=0.125 2024-08-12 18:14:01,999 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2050, loss[loss=0.1155, beats_loss=0.01061, ecapa_loss=0.0001843, whisper_loss=0.103, over 19824.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001696, whisper_loss=0.09154, over 3803024.23 frames. ], batch size: 77, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:14:05,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1759580.0, ans=0.125 2024-08-12 18:14:15,407 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 18:14:17,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2024-08-12 18:14:33,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1759780.0, ans=0.125 2024-08-12 18:14:53,660 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 18:14:54,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1759880.0, ans=0.125 2024-08-12 18:15:02,428 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 18:15:12,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-08-12 18:15:18,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2100, loss[loss=0.09763, beats_loss=0.01384, ecapa_loss=0.0001055, whisper_loss=0.08273, over 17992.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001697, whisper_loss=0.09046, over 3780081.34 frames. ], batch size: 69, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:15:21,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-12 18:15:23,533 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 18:15:24,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1760080.0, ans=0.0 2024-08-12 18:15:43,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.436e+01 2.700e+01 3.111e+01 5.079e+01, threshold=5.401e+01, percent-clipped=0.0 2024-08-12 18:15:46,303 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 14 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 18:15:55,845 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 18:15:56,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1760280.0, ans=0.0 2024-08-12 18:16:05,275 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.517e+05 2024-08-12 18:16:09,488 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:16:12,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-12 18:16:14,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1760380.0, ans=0.1 2024-08-12 18:16:17,511 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 18:16:30,197 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2150, loss[loss=0.1005, beats_loss=0.009988, ecapa_loss=0.0001326, whisper_loss=0.08921, over 16861.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.0001706, whisper_loss=0.09115, over 3761081.89 frames. ], batch size: 61, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:16:38,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2024-08-12 18:16:56,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1760780.0, ans=0.0 2024-08-12 18:17:00,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-08-12 18:17:19,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1760880.0, ans=0.0 2024-08-12 18:17:21,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1760880.0, ans=0.125 2024-08-12 18:17:29,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1760980.0, ans=0.2 2024-08-12 18:17:37,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2200, loss[loss=0.1175, beats_loss=0.009838, ecapa_loss=0.0001573, whisper_loss=0.1061, over 17661.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001684, whisper_loss=0.09165, over 3779146.55 frames. ], batch size: 66, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:17:42,698 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 18:18:00,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.455e+01 2.695e+01 3.002e+01 4.139e+01, threshold=5.389e+01, percent-clipped=0.0 2024-08-12 18:18:09,622 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 18:18:09,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1761280.0, ans=0.125 2024-08-12 18:18:24,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1761380.0, ans=0.1 2024-08-12 18:18:36,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1761480.0, ans=0.0 2024-08-12 18:18:39,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1761480.0, ans=0.125 2024-08-12 18:18:42,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-12 18:18:42,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2250, loss[loss=0.1028, beats_loss=0.01155, ecapa_loss=0.0001376, whisper_loss=0.08984, over 17699.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01085, ecapa_loss=0.00017, whisper_loss=0.09232, over 3801902.08 frames. ], batch size: 65, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:18:50,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1761580.0, ans=0.125 2024-08-12 18:19:00,607 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 18:19:05,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1761680.0, ans=0.05 2024-08-12 18:19:24,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1761880.0, ans=0.1 2024-08-12 18:19:28,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-12 18:19:42,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1761980.0, ans=0.0 2024-08-12 18:19:47,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2300, loss[loss=0.08488, beats_loss=0.01138, ecapa_loss=0.0001798, whisper_loss=0.0717, over 16420.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01085, ecapa_loss=0.0001695, whisper_loss=0.09272, over 3835182.07 frames. ], batch size: 64, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:19:48,489 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 18:19:51,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=12.0 2024-08-12 18:20:00,360 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 18:20:05,988 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.693e+05 2024-08-12 18:20:10,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.458e+01 2.734e+01 3.155e+01 5.696e+01, threshold=5.468e+01, percent-clipped=1.0 2024-08-12 18:20:28,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1762380.0, ans=0.125 2024-08-12 18:20:39,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.89 vs. limit=15.0 2024-08-12 18:20:52,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2350, loss[loss=0.09765, beats_loss=0.01059, ecapa_loss=0.0001985, whisper_loss=0.08508, over 22693.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01078, ecapa_loss=0.0001706, whisper_loss=0.09357, over 3880131.92 frames. ], batch size: 92, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:21:08,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.37 vs. limit=22.5 2024-08-12 18:21:10,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1762680.0, ans=0.125 2024-08-12 18:21:12,852 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 18:21:14,114 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 18:21:20,665 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 18:21:24,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1762780.0, ans=0.2 2024-08-12 18:21:30,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1762780.0, ans=0.125 2024-08-12 18:21:35,387 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 18:21:39,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1762880.0, ans=0.125 2024-08-12 18:21:44,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1762980.0, ans=0.125 2024-08-12 18:21:58,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2400, loss[loss=0.1093, beats_loss=0.01032, ecapa_loss=0.0001723, whisper_loss=0.09721, over 22873.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01077, ecapa_loss=0.0001715, whisper_loss=0.09335, over 3877933.28 frames. ], batch size: 88, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:22:03,902 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05874495208263397, model_norm_threshold=54.68092727661133 2024-08-12 18:22:04,071 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.484e+05, grad_sumsq=9.566e+04, orig_rms_sq=8.869e+00 2024-08-12 18:22:18,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-08-12 18:22:22,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.513e+01 2.845e+01 3.166e+01 9.308e+02, threshold=5.690e+01, percent-clipped=1.0 2024-08-12 18:22:28,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1763280.0, ans=0.95 2024-08-12 18:22:34,435 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2024-08-12 18:22:37,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1763380.0, ans=0.0 2024-08-12 18:22:42,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1763380.0, ans=0.0 2024-08-12 18:23:04,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2450, loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001752, whisper_loss=0.09123, over 17113.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0108, ecapa_loss=0.0001716, whisper_loss=0.09261, over 3889194.07 frames. ], batch size: 69, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:23:14,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1763580.0, ans=0.125 2024-08-12 18:23:19,076 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 18:23:29,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1763780.0, ans=0.0 2024-08-12 18:23:29,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1763780.0, ans=0.1 2024-08-12 18:23:31,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2024-08-12 18:23:49,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1763880.0, ans=0.0 2024-08-12 18:23:59,776 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 18:24:00,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1763980.0, ans=0.0 2024-08-12 18:24:00,901 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 18:24:01,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1763980.0, ans=0.2 2024-08-12 18:24:04,873 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 18:24:09,638 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2500, loss[loss=0.1026, beats_loss=0.01092, ecapa_loss=0.0001869, whisper_loss=0.08984, over 18883.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001713, whisper_loss=0.09232, over 3878115.27 frames. ], batch size: 74, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:24:21,700 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 18:24:27,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1764180.0, ans=0.0 2024-08-12 18:24:32,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.522e+01 2.839e+01 3.431e+01 9.983e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 18:24:43,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1764280.0, ans=0.2 2024-08-12 18:24:49,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1764380.0, ans=0.2 2024-08-12 18:24:50,316 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 18:24:56,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1764380.0, ans=0.1 2024-08-12 18:24:56,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1764380.0, ans=0.09899494936611666 2024-08-12 18:25:15,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2550, loss[loss=0.1219, beats_loss=0.01006, ecapa_loss=0.0001808, whisper_loss=0.1101, over 23956.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01076, ecapa_loss=0.0001714, whisper_loss=0.09283, over 3914384.09 frames. ], batch size: 93, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:25:44,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1764780.0, ans=0.0 2024-08-12 18:25:52,051 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 29 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 18:25:57,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1764880.0, ans=0.125 2024-08-12 18:26:01,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1764880.0, ans=0.125 2024-08-12 18:26:06,427 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 18:26:06,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1764980.0, ans=0.1 2024-08-12 18:26:07,707 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 18:26:20,705 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2600, loss[loss=0.09067, beats_loss=0.01145, ecapa_loss=0.0001451, whisper_loss=0.07777, over 18852.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001727, whisper_loss=0.09175, over 3909380.07 frames. ], batch size: 73, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:26:21,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1765080.0, ans=0.1 2024-08-12 18:26:27,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1765080.0, ans=0.0 2024-08-12 18:26:31,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1765080.0, ans=0.2 2024-08-12 18:26:43,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.522e+01 2.874e+01 3.178e+01 1.791e+02, threshold=5.747e+01, percent-clipped=2.0 2024-08-12 18:26:47,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2024-08-12 18:27:02,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1765380.0, ans=0.0 2024-08-12 18:27:07,342 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 18:27:12,360 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-12 18:27:19,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1765480.0, ans=0.125 2024-08-12 18:27:21,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=15.0 2024-08-12 18:27:25,507 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2650, loss[loss=0.09566, beats_loss=0.01052, ecapa_loss=0.0001703, whisper_loss=0.08344, over 22359.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001729, whisper_loss=0.09137, over 3905236.28 frames. ], batch size: 92, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:27:35,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1765580.0, ans=0.125 2024-08-12 18:27:37,622 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 18:27:46,499 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 18:27:50,439 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 18:27:50,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1765780.0, ans=0.0 2024-08-12 18:28:00,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1765780.0, ans=0.0 2024-08-12 18:28:10,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1765880.0, ans=0.125 2024-08-12 18:28:22,517 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 18:28:25,115 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 18:28:31,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2700, loss[loss=0.1075, beats_loss=0.0113, ecapa_loss=0.0001529, whisper_loss=0.09467, over 18278.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0001727, whisper_loss=0.09097, over 3897777.75 frames. ], batch size: 71, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:28:31,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1766080.0, ans=0.125 2024-08-12 18:28:39,307 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 18:28:41,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=22.5 2024-08-12 18:28:48,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1766180.0, ans=0.5 2024-08-12 18:28:54,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.345e+01 2.624e+01 3.036e+01 4.476e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 18:28:57,742 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 18:29:02,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1766280.0, ans=0.125 2024-08-12 18:29:16,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1766380.0, ans=0.0 2024-08-12 18:29:23,943 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 13 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 18:29:29,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1766480.0, ans=0.1 2024-08-12 18:29:36,516 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2750, loss[loss=0.09203, beats_loss=0.01315, ecapa_loss=0.0001728, whisper_loss=0.07715, over 22026.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01099, ecapa_loss=0.0001709, whisper_loss=0.09064, over 3882846.82 frames. ], batch size: 93, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:30:14,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1766780.0, ans=0.2 2024-08-12 18:30:16,162 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 18:30:20,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1766880.0, ans=0.125 2024-08-12 18:30:25,515 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-12 18:30:31,694 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 18:30:32,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1766980.0, ans=0.125 2024-08-12 18:30:37,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1766980.0, ans=0.0 2024-08-12 18:30:37,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1766980.0, ans=0.125 2024-08-12 18:30:42,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2800, loss[loss=0.1047, beats_loss=0.0112, ecapa_loss=0.0001821, whisper_loss=0.09169, over 17764.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01095, ecapa_loss=0.00017, whisper_loss=0.0906, over 3872009.59 frames. ], batch size: 73, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:30:46,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1767080.0, ans=0.2 2024-08-12 18:30:48,851 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 18:30:56,774 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 18:31:02,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1767180.0, ans=0.125 2024-08-12 18:31:06,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.517e+01 2.667e+01 2.964e+01 5.320e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-12 18:31:25,437 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-12 18:31:38,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1767480.0, ans=0.0 2024-08-12 18:31:46,355 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-12 18:31:48,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2850, loss[loss=0.09069, beats_loss=0.01231, ecapa_loss=0.0001648, whisper_loss=0.07673, over 20986.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01095, ecapa_loss=0.0001702, whisper_loss=0.0907, over 3872928.85 frames. ], batch size: 89, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:31:50,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1767580.0, ans=10.0 2024-08-12 18:31:54,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1767580.0, ans=0.2 2024-08-12 18:31:59,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-12 18:32:01,959 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 18:32:04,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1767680.0, ans=0.0 2024-08-12 18:32:05,651 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 18:32:06,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-08-12 18:32:16,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1767780.0, ans=0.05 2024-08-12 18:32:16,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1767780.0, ans=0.2 2024-08-12 18:32:21,557 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 18:32:24,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=12.0 2024-08-12 18:32:36,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1767880.0, ans=0.0 2024-08-12 18:32:40,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1767980.0, ans=0.05 2024-08-12 18:32:53,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2900, loss[loss=0.1218, beats_loss=0.01055, ecapa_loss=0.0001705, whisper_loss=0.1096, over 16509.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.011, ecapa_loss=0.0001712, whisper_loss=0.09144, over 3917712.14 frames. ], batch size: 65, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:32:58,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1768080.0, ans=0.1 2024-08-12 18:33:07,162 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 18:33:09,904 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 18:33:11,334 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 18:33:13,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2024-08-12 18:33:19,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.482e+01 2.869e+01 3.422e+01 8.599e+01, threshold=5.738e+01, percent-clipped=1.0 2024-08-12 18:33:21,751 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 18:33:27,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1768280.0, ans=0.125 2024-08-12 18:33:30,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-08-12 18:33:38,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1768380.0, ans=0.0 2024-08-12 18:33:52,737 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-12 18:33:54,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1768480.0, ans=0.125 2024-08-12 18:34:00,681 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 2950, loss[loss=0.1024, beats_loss=0.01253, ecapa_loss=0.0001707, whisper_loss=0.0882, over 22259.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01102, ecapa_loss=0.000172, whisper_loss=0.09051, over 3894724.19 frames. ], batch size: 91, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:34:05,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1768580.0, ans=0.0 2024-08-12 18:34:52,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=12.0 2024-08-12 18:34:57,989 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-12 18:35:06,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1768980.0, ans=0.0 2024-08-12 18:35:09,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1769080.0, ans=0.0 2024-08-12 18:35:10,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3000, loss[loss=0.1184, beats_loss=0.01135, ecapa_loss=0.0001485, whisper_loss=0.1056, over 21201.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.0001717, whisper_loss=0.09098, over 3916266.91 frames. ], batch size: 83, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:35:10,461 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 18:35:46,413 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005879, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 18:36:04,748 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on SV_voxceleb1: loss=0.004639, beats_loss=0, ecapa_loss=0.0004639, whisper_loss=0, over 939242.00 frames. 2024-08-12 18:37:53,591 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 18:37:53,599 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 18:38:01,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1769080.0, ans=0.125 2024-08-12 18:38:07,965 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 18:38:18,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.438e+01 2.713e+01 3.016e+01 4.001e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-12 18:38:39,711 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 18:38:45,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1769480.0, ans=0.0 2024-08-12 18:38:52,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2024-08-12 18:38:59,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3050, loss[loss=0.1048, beats_loss=0.01202, ecapa_loss=0.0001934, whisper_loss=0.09084, over 22296.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001719, whisper_loss=0.09158, over 3921729.52 frames. ], batch size: 92, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:39:02,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1769580.0, ans=0.0 2024-08-12 18:39:11,236 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 18:39:22,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1769680.0, ans=0.0 2024-08-12 18:39:36,296 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 18:39:48,683 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 18:39:49,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1769880.0, ans=0.2 2024-08-12 18:39:51,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1769880.0, ans=0.125 2024-08-12 18:39:55,736 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 18:40:09,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3100, loss[loss=0.1148, beats_loss=0.009603, ecapa_loss=0.0001552, whisper_loss=0.1036, over 19371.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001732, whisper_loss=0.0918, over 3895649.45 frames. ], batch size: 72, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:40:19,528 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 18:40:21,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1770080.0, ans=0.0 2024-08-12 18:40:25,245 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 18:40:25,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1770180.0, ans=0.0 2024-08-12 18:40:28,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1770180.0, ans=0.125 2024-08-12 18:40:29,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1770180.0, ans=0.125 2024-08-12 18:40:35,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1770180.0, ans=0.125 2024-08-12 18:40:36,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.497e+01 2.868e+01 3.286e+01 7.289e+01, threshold=5.737e+01, percent-clipped=2.0 2024-08-12 18:40:40,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1770280.0, ans=0.2 2024-08-12 18:40:43,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1770280.0, ans=0.0 2024-08-12 18:40:45,344 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 18:40:52,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1770380.0, ans=0.95 2024-08-12 18:40:53,286 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 18:40:55,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1770380.0, ans=0.125 2024-08-12 18:40:57,972 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 38 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 18:41:08,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1770480.0, ans=0.125 2024-08-12 18:41:21,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3150, loss[loss=0.1153, beats_loss=0.009212, ecapa_loss=0.0002244, whisper_loss=0.1038, over 19750.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0109, ecapa_loss=0.0001738, whisper_loss=0.09277, over 3886839.37 frames. ], batch size: 80, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:41:22,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2024-08-12 18:41:31,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1770580.0, ans=0.125 2024-08-12 18:41:35,671 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 18:41:37,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1770680.0, ans=0.0 2024-08-12 18:41:52,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-12 18:41:59,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1770780.0, ans=0.125 2024-08-12 18:42:05,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1770880.0, ans=0.125 2024-08-12 18:42:12,591 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:42:14,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-12 18:42:19,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1770980.0, ans=0.2 2024-08-12 18:42:27,293 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:42:27,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1770980.0, ans=0.04949747468305833 2024-08-12 18:42:34,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3200, loss[loss=0.127, beats_loss=0.008106, ecapa_loss=0.0002012, whisper_loss=0.1169, over 15423.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01084, ecapa_loss=0.0001753, whisper_loss=0.09315, over 3875108.17 frames. ], batch size: 61, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:42:37,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-12 18:42:52,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.15 vs. limit=10.0 2024-08-12 18:42:54,783 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:43:02,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.434e+01 2.699e+01 3.191e+01 8.641e+01, threshold=5.397e+01, percent-clipped=3.0 2024-08-12 18:43:03,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1771280.0, ans=0.125 2024-08-12 18:43:19,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1771380.0, ans=0.0 2024-08-12 18:43:20,662 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 18:43:31,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1771480.0, ans=0.125 2024-08-12 18:43:32,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1771480.0, ans=0.0 2024-08-12 18:43:46,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3250, loss[loss=0.09227, beats_loss=0.01294, ecapa_loss=0.0001663, whisper_loss=0.07766, over 15060.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001747, whisper_loss=0.09245, over 3863021.32 frames. ], batch size: 61, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:43:46,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1771580.0, ans=0.1 2024-08-12 18:44:16,649 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 18:44:17,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.81 vs. limit=22.5 2024-08-12 18:44:18,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1771780.0, ans=0.0 2024-08-12 18:44:39,102 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 18:44:43,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1771980.0, ans=0.125 2024-08-12 18:44:52,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2024-08-12 18:44:53,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1771980.0, ans=0.0 2024-08-12 18:44:58,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3300, loss[loss=0.09038, beats_loss=0.01216, ecapa_loss=0.0002202, whisper_loss=0.07601, over 20313.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01087, ecapa_loss=0.0001738, whisper_loss=0.09293, over 3865912.36 frames. ], batch size: 90, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:44:59,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.32 vs. limit=15.0 2024-08-12 18:45:04,991 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 18:45:23,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1772180.0, ans=0.125 2024-08-12 18:45:26,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.521e+01 2.800e+01 3.274e+01 5.621e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 18:45:30,772 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 18:46:10,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1772580.0, ans=0.0 2024-08-12 18:46:11,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3350, loss[loss=0.09189, beats_loss=0.01098, ecapa_loss=0.000207, whisper_loss=0.07885, over 22152.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01085, ecapa_loss=0.0001731, whisper_loss=0.09282, over 3863700.83 frames. ], batch size: 96, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:46:17,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1772580.0, ans=0.2 2024-08-12 18:46:23,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-12 18:46:44,116 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 18:46:48,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1772780.0, ans=0.125 2024-08-12 18:46:51,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1772780.0, ans=0.0 2024-08-12 18:47:07,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.29 vs. limit=6.0 2024-08-12 18:47:10,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=12.0 2024-08-12 18:47:15,864 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 17 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 18:47:22,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3400, loss[loss=0.09711, beats_loss=0.01068, ecapa_loss=0.0001858, whisper_loss=0.08457, over 15164.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001731, whisper_loss=0.09247, over 3839292.30 frames. ], batch size: 60, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:47:37,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1773180.0, ans=0.0 2024-08-12 18:47:50,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.407e+01 2.669e+01 3.067e+01 7.735e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-12 18:48:02,274 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 18:48:19,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-08-12 18:48:20,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.68 vs. limit=10.0 2024-08-12 18:48:32,750 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 18:48:36,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3450, loss[loss=0.1055, beats_loss=0.01193, ecapa_loss=0.000181, whisper_loss=0.09178, over 22563.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01087, ecapa_loss=0.0001734, whisper_loss=0.09271, over 3869436.97 frames. ], batch size: 92, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:48:46,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1773580.0, ans=0.09899494936611666 2024-08-12 18:48:53,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1773680.0, ans=0.125 2024-08-12 18:49:11,467 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 18:49:15,766 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 18:49:17,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=12.0 2024-08-12 18:49:24,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1773880.0, ans=0.125 2024-08-12 18:49:37,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1773980.0, ans=0.05 2024-08-12 18:49:38,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1773980.0, ans=0.125 2024-08-12 18:49:47,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3500, loss[loss=0.107, beats_loss=0.01045, ecapa_loss=0.0001804, whisper_loss=0.09475, over 22675.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.000172, whisper_loss=0.09269, over 3869768.31 frames. ], batch size: 93, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:50:06,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1774180.0, ans=0.05 2024-08-12 18:50:14,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.564e+01 2.746e+01 3.042e+01 5.198e+01, threshold=5.491e+01, percent-clipped=0.0 2024-08-12 18:50:19,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1774280.0, ans=0.125 2024-08-12 18:50:21,804 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 18:50:26,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1774280.0, ans=0.125 2024-08-12 18:50:40,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-12 18:50:58,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3550, loss[loss=0.11, beats_loss=0.01129, ecapa_loss=0.0001836, whisper_loss=0.09688, over 22084.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.0001727, whisper_loss=0.09261, over 3904450.35 frames. ], batch size: 89, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:51:11,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1774680.0, ans=0.1 2024-08-12 18:51:12,927 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 18:51:17,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1774680.0, ans=0.5 2024-08-12 18:51:21,617 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 18:51:26,201 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 18:51:27,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-08-12 18:51:29,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1774780.0, ans=0.2 2024-08-12 18:51:30,281 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 18:51:39,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1774780.0, ans=0.2 2024-08-12 18:51:54,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1774880.0, ans=0.125 2024-08-12 18:51:57,088 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 18:52:11,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3600, loss[loss=0.1025, beats_loss=0.008659, ecapa_loss=0.0002219, whisper_loss=0.0916, over 17628.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01086, ecapa_loss=0.000173, whisper_loss=0.09268, over 3898659.39 frames. ], batch size: 74, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:52:13,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1775080.0, ans=0.035 2024-08-12 18:52:19,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=12.0 2024-08-12 18:52:19,764 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-12 18:52:27,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-12 18:52:38,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.432e+01 2.743e+01 3.098e+01 5.002e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-12 18:52:43,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1775280.0, ans=0.0 2024-08-12 18:53:10,949 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 18:53:12,255 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 18:53:20,837 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 18:53:23,531 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3650, loss[loss=0.1054, beats_loss=0.00976, ecapa_loss=0.0001729, whisper_loss=0.09392, over 20015.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01085, ecapa_loss=0.0001729, whisper_loss=0.09229, over 3858884.94 frames. ], batch size: 77, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:53:38,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1775680.0, ans=0.125 2024-08-12 18:54:14,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1775880.0, ans=0.125 2024-08-12 18:54:27,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1775980.0, ans=0.2 2024-08-12 18:54:32,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-12 18:54:36,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3700, loss[loss=0.1094, beats_loss=0.01084, ecapa_loss=0.0001613, whisper_loss=0.09692, over 20684.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001732, whisper_loss=0.09168, over 3847822.44 frames. ], batch size: 83, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:54:42,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1776080.0, ans=0.1 2024-08-12 18:54:45,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1776080.0, ans=0.125 2024-08-12 18:54:46,531 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-12 18:55:02,150 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 18:55:03,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.392e+01 2.654e+01 3.110e+01 5.350e+01, threshold=5.308e+01, percent-clipped=0.0 2024-08-12 18:55:24,483 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.242e+02 2024-08-12 18:55:28,121 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 18:55:48,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3750, loss[loss=0.08712, beats_loss=0.01206, ecapa_loss=0.0001668, whisper_loss=0.0734, over 18071.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01102, ecapa_loss=0.0001721, whisper_loss=0.09106, over 3864208.98 frames. ], batch size: 71, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:56:09,666 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 18:56:18,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1776780.0, ans=0.125 2024-08-12 18:56:53,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=8.0 2024-08-12 18:56:58,960 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 18:57:03,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3800, loss[loss=0.123, beats_loss=0.008242, ecapa_loss=0.0001651, whisper_loss=0.1131, over 20599.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001721, whisper_loss=0.09136, over 3872685.48 frames. ], batch size: 80, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:57:07,434 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 18:57:10,178 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 18:57:18,197 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 35 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 18:57:21,359 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 18:57:31,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.486e+01 2.799e+01 3.183e+01 6.177e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 18:57:35,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1777280.0, ans=0.1 2024-08-12 18:57:42,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.74 vs. limit=10.0 2024-08-12 18:58:05,969 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 18:58:14,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1777480.0, ans=0.1 2024-08-12 18:58:19,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3850, loss[loss=0.08316, beats_loss=0.01355, ecapa_loss=0.0001664, whisper_loss=0.06795, over 16071.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01092, ecapa_loss=0.0001734, whisper_loss=0.09207, over 3868231.15 frames. ], batch size: 67, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:58:26,484 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 18:58:39,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1777680.0, ans=0.2 2024-08-12 18:58:57,270 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 21 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-12 18:58:57,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1777780.0, ans=0.125 2024-08-12 18:59:06,530 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-12 18:59:24,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1777980.0, ans=0.125 2024-08-12 18:59:25,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1777980.0, ans=0.0 2024-08-12 18:59:28,122 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 31 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 18:59:36,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3900, loss[loss=0.09895, beats_loss=0.009014, ecapa_loss=0.0002172, whisper_loss=0.08776, over 13255.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01096, ecapa_loss=0.0001736, whisper_loss=0.09267, over 3894440.36 frames. ], batch size: 55, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:59:53,987 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 18:59:56,595 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 19:00:05,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.460e+01 2.720e+01 3.134e+01 5.284e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 19:00:13,358 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 19:00:20,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1778380.0, ans=0.125 2024-08-12 19:00:32,283 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 19:00:37,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1778480.0, ans=0.0 2024-08-12 19:00:38,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1778480.0, ans=0.0 2024-08-12 19:00:41,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1778480.0, ans=0.025 2024-08-12 19:00:48,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1778480.0, ans=0.1 2024-08-12 19:00:52,565 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 19:00:53,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 3950, loss[loss=0.1106, beats_loss=0.01173, ecapa_loss=0.0001613, whisper_loss=0.0973, over 19685.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01096, ecapa_loss=0.000173, whisper_loss=0.09253, over 3925114.55 frames. ], batch size: 78, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:01:21,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2024-08-12 19:01:40,818 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 14 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 19:01:41,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1778880.0, ans=10.0 2024-08-12 19:01:44,104 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 19:01:49,736 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 19:02:00,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1778980.0, ans=0.0 2024-08-12 19:02:02,782 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 19:02:08,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1779080.0, ans=0.2 2024-08-12 19:02:08,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4000, loss[loss=0.0998, beats_loss=0.0105, ecapa_loss=0.0001587, whisper_loss=0.08771, over 20591.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001722, whisper_loss=0.09162, over 3868517.95 frames. ], batch size: 82, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:02:12,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1779080.0, ans=0.0 2024-08-12 19:02:14,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1779080.0, ans=0.2 2024-08-12 19:02:15,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1779080.0, ans=0.09899494936611666 2024-08-12 19:02:30,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1779180.0, ans=0.1 2024-08-12 19:02:32,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1779180.0, ans=0.1 2024-08-12 19:02:33,308 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 14 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 19:02:39,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.414e+01 2.670e+01 2.988e+01 4.666e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-12 19:02:52,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1779280.0, ans=0.2 2024-08-12 19:03:02,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=22.5 2024-08-12 19:03:03,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1779380.0, ans=0.125 2024-08-12 19:03:20,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1779480.0, ans=0.125 2024-08-12 19:03:28,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1779580.0, ans=0.125 2024-08-12 19:03:29,166 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4050, loss[loss=0.08036, beats_loss=0.01028, ecapa_loss=0.0001898, whisper_loss=0.06819, over 16665.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01093, ecapa_loss=0.0001725, whisper_loss=0.09206, over 3862676.28 frames. ], batch size: 70, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:03:29,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1779580.0, ans=0.125 2024-08-12 19:03:32,011 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 19:03:43,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1779680.0, ans=0.125 2024-08-12 19:03:47,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1779680.0, ans=0.05 2024-08-12 19:04:01,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2024-08-12 19:04:48,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4100, loss[loss=0.1057, beats_loss=0.008748, ecapa_loss=0.0001765, whisper_loss=0.09521, over 16358.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01093, ecapa_loss=0.0001738, whisper_loss=0.09253, over 3895809.54 frames. ], batch size: 64, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:04:56,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1780080.0, ans=0.125 2024-08-12 19:04:59,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1780080.0, ans=0.2 2024-08-12 19:05:02,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1780180.0, ans=0.125 2024-08-12 19:05:14,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1780180.0, ans=0.0 2024-08-12 19:05:16,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.487e+01 2.905e+01 3.188e+01 5.523e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-12 19:05:19,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1780280.0, ans=0.125 2024-08-12 19:05:24,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1780280.0, ans=0.0 2024-08-12 19:05:38,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1780380.0, ans=0.2 2024-08-12 19:05:46,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1780380.0, ans=0.125 2024-08-12 19:06:04,733 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 19:06:07,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4150, loss[loss=0.1276, beats_loss=0.01008, ecapa_loss=0.0001794, whisper_loss=0.1157, over 17068.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01088, ecapa_loss=0.000176, whisper_loss=0.09311, over 3890554.47 frames. ], batch size: 67, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:06:23,449 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 21 from Vox, 52 fro AS 2024-08-12 19:06:34,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.12 vs. limit=10.0 2024-08-12 19:06:43,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1780780.0, ans=0.125 2024-08-12 19:06:44,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1780780.0, ans=0.125 2024-08-12 19:07:23,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1780980.0, ans=0.125 2024-08-12 19:07:26,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4200, loss[loss=0.1136, beats_loss=0.01205, ecapa_loss=0.0001777, whisper_loss=0.09973, over 23695.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01094, ecapa_loss=0.0001753, whisper_loss=0.09275, over 3905935.21 frames. ], batch size: 95, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:07:55,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1781180.0, ans=0.1 2024-08-12 19:07:56,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.440e+01 2.909e+01 3.594e+01 1.116e+02, threshold=5.819e+01, percent-clipped=3.0 2024-08-12 19:07:57,456 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 19:08:02,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1781280.0, ans=0.125 2024-08-12 19:08:08,721 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:08:37,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1781480.0, ans=0.125 2024-08-12 19:08:46,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=22.5 2024-08-12 19:08:49,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4250, loss[loss=0.09961, beats_loss=0.01152, ecapa_loss=0.000174, whisper_loss=0.08635, over 21373.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01097, ecapa_loss=0.0001739, whisper_loss=0.09253, over 3936747.75 frames. ], batch size: 86, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:08:57,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1781580.0, ans=0.0 2024-08-12 19:09:38,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=12.0 2024-08-12 19:09:40,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1781880.0, ans=0.125 2024-08-12 19:09:52,339 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 19:10:08,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2024-08-12 19:10:08,387 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4300, loss[loss=0.1151, beats_loss=0.009205, ecapa_loss=0.0001593, whisper_loss=0.1043, over 24163.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.000173, whisper_loss=0.09167, over 3942697.81 frames. ], batch size: 89, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:10:12,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1782080.0, ans=0.125 2024-08-12 19:10:16,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1782080.0, ans=0.2 2024-08-12 19:10:27,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-08-12 19:10:31,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1782180.0, ans=0.0 2024-08-12 19:10:32,745 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 19:10:37,765 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.369e+01 2.676e+01 2.998e+01 4.612e+01, threshold=5.352e+01, percent-clipped=0.0 2024-08-12 19:10:40,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1782280.0, ans=0.2 2024-08-12 19:10:40,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=22.5 2024-08-12 19:10:47,464 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 26 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-12 19:10:50,466 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 19:11:05,103 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 19:11:15,468 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 19:11:27,215 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4350, loss[loss=0.1254, beats_loss=0.007283, ecapa_loss=0.0002014, whisper_loss=0.1161, over 23065.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01095, ecapa_loss=0.0001727, whisper_loss=0.09119, over 3916014.03 frames. ], batch size: 89, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:12:14,493 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 19:12:22,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1782880.0, ans=0.95 2024-08-12 19:12:40,543 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 19:12:49,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4400, loss[loss=0.0941, beats_loss=0.01093, ecapa_loss=0.0001574, whisper_loss=0.0816, over 17023.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.0001728, whisper_loss=0.09161, over 3936955.56 frames. ], batch size: 68, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:12:51,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1783080.0, ans=0.125 2024-08-12 19:13:02,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2024-08-12 19:13:03,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1783080.0, ans=0.1 2024-08-12 19:13:10,584 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 19:13:21,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.411e+01 2.660e+01 2.962e+01 4.713e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-12 19:13:24,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1783280.0, ans=0.0 2024-08-12 19:13:33,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1783280.0, ans=0.1 2024-08-12 19:13:48,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1783380.0, ans=0.125 2024-08-12 19:13:50,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-08-12 19:14:04,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1783480.0, ans=0.1 2024-08-12 19:14:05,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1783480.0, ans=0.2 2024-08-12 19:14:13,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4450, loss[loss=0.1056, beats_loss=0.01131, ecapa_loss=0.0001781, whisper_loss=0.09252, over 20783.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001734, whisper_loss=0.09134, over 3913088.69 frames. ], batch size: 83, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:14:22,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1783580.0, ans=0.125 2024-08-12 19:14:30,053 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 19:14:42,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1783680.0, ans=0.0 2024-08-12 19:14:50,164 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 19:15:06,055 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 15 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 19:15:25,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1783980.0, ans=0.125 2024-08-12 19:15:38,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1783980.0, ans=0.125 2024-08-12 19:15:41,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4500, loss[loss=0.115, beats_loss=0.009781, ecapa_loss=0.0001828, whisper_loss=0.1034, over 18386.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001729, whisper_loss=0.09129, over 3912080.68 frames. ], batch size: 73, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:15:46,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1784080.0, ans=0.125 2024-08-12 19:15:46,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1784080.0, ans=0.0 2024-08-12 19:15:47,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1784080.0, ans=0.125 2024-08-12 19:16:13,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.482e+01 2.920e+01 3.537e+01 6.104e+01, threshold=5.841e+01, percent-clipped=3.0 2024-08-12 19:16:16,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1784280.0, ans=0.125 2024-08-12 19:16:35,519 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 19:17:04,622 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.416e+01 2024-08-12 19:17:07,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4550, loss[loss=0.102, beats_loss=0.01142, ecapa_loss=0.0001576, whisper_loss=0.08901, over 19259.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001753, whisper_loss=0.09166, over 3902528.99 frames. ], batch size: 77, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:17:09,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2024-08-12 19:17:22,738 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 19:17:42,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1784780.0, ans=0.1 2024-08-12 19:17:49,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1784780.0, ans=0.07 2024-08-12 19:17:49,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1784780.0, ans=0.2 2024-08-12 19:17:52,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1784780.0, ans=0.2 2024-08-12 19:17:53,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-12 19:17:57,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1784780.0, ans=0.125 2024-08-12 19:18:08,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1784880.0, ans=0.0 2024-08-12 19:18:33,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4600, loss[loss=0.1055, beats_loss=0.01228, ecapa_loss=0.0001469, whisper_loss=0.09178, over 16085.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01088, ecapa_loss=0.0001743, whisper_loss=0.09184, over 3928308.13 frames. ], batch size: 61, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:18:49,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1785180.0, ans=0.2 2024-08-12 19:19:04,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.452e+01 2.765e+01 3.164e+01 4.953e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-12 19:19:10,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1785280.0, ans=0.04949747468305833 2024-08-12 19:19:14,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-12 19:19:35,997 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:19:48,037 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 19:19:52,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4650, loss[loss=0.07933, beats_loss=0.0134, ecapa_loss=0.0001456, whisper_loss=0.06448, over 20107.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01095, ecapa_loss=0.0001743, whisper_loss=0.09125, over 3904917.71 frames. ], batch size: 82, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:20:06,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1785580.0, ans=0.1 2024-08-12 19:20:18,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1785680.0, ans=0.0 2024-08-12 19:20:25,830 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 19:20:27,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1785780.0, ans=0.125 2024-08-12 19:20:43,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1785880.0, ans=0.125 2024-08-12 19:20:57,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-08-12 19:21:03,796 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 19:21:10,306 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 19:21:12,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4700, loss[loss=0.08537, beats_loss=0.01175, ecapa_loss=0.0001546, whisper_loss=0.07208, over 17525.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01097, ecapa_loss=0.0001737, whisper_loss=0.09203, over 3890924.72 frames. ], batch size: 69, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:21:14,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1786080.0, ans=0.125 2024-08-12 19:21:18,876 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 19:21:35,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1786180.0, ans=0.125 2024-08-12 19:21:35,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-08-12 19:21:43,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.537e+01 2.789e+01 3.116e+01 4.712e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-12 19:21:46,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1786280.0, ans=0.0 2024-08-12 19:22:05,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1786380.0, ans=0.1 2024-08-12 19:22:13,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1786380.0, ans=0.125 2024-08-12 19:22:17,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2024-08-12 19:22:32,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4750, loss[loss=0.1016, beats_loss=0.01285, ecapa_loss=0.0002126, whisper_loss=0.08659, over 17230.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01092, ecapa_loss=0.0001741, whisper_loss=0.09209, over 3879316.22 frames. ], batch size: 73, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:22:54,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1786680.0, ans=0.125 2024-08-12 19:23:01,606 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 19:23:23,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.78 vs. limit=15.0 2024-08-12 19:23:38,230 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 19:23:50,079 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.067e-02 2024-08-12 19:23:50,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4800, loss[loss=0.1049, beats_loss=0.01088, ecapa_loss=0.0001866, whisper_loss=0.09218, over 21288.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01097, ecapa_loss=0.0001737, whisper_loss=0.09251, over 3905458.04 frames. ], batch size: 84, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:23:52,931 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 19:24:15,723 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 19:24:20,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.537e+01 2.789e+01 3.212e+01 6.421e+01, threshold=5.577e+01, percent-clipped=1.0 2024-08-12 19:24:32,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2024-08-12 19:24:43,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1787380.0, ans=0.025 2024-08-12 19:24:47,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1787380.0, ans=0.0 2024-08-12 19:25:04,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1787480.0, ans=0.125 2024-08-12 19:25:08,172 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 19:25:10,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4850, loss[loss=0.112, beats_loss=0.0104, ecapa_loss=0.0001591, whisper_loss=0.1, over 22904.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01096, ecapa_loss=0.0001744, whisper_loss=0.09223, over 3921732.43 frames. ], batch size: 91, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:25:11,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1787580.0, ans=0.1 2024-08-12 19:25:13,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1787580.0, ans=0.125 2024-08-12 19:25:25,907 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 19:25:29,203 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-12 19:26:05,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1787880.0, ans=0.1 2024-08-12 19:26:14,800 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 19:26:19,624 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 19:26:22,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-08-12 19:26:23,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1787980.0, ans=0.02 2024-08-12 19:26:24,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2024-08-12 19:26:33,852 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 27 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 19:26:35,122 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4900, loss[loss=0.1014, beats_loss=0.01175, ecapa_loss=0.0002056, whisper_loss=0.0876, over 23656.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01102, ecapa_loss=0.0001733, whisper_loss=0.0919, over 3934995.39 frames. ], batch size: 96, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:26:47,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1788080.0, ans=0.0 2024-08-12 19:26:51,022 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 19:26:59,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1788180.0, ans=0.0 2024-08-12 19:27:06,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.493e+01 2.714e+01 3.066e+01 4.979e+01, threshold=5.428e+01, percent-clipped=0.0 2024-08-12 19:27:22,678 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-12 19:27:44,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2024-08-12 19:27:50,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1788480.0, ans=0.125 2024-08-12 19:27:53,900 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 19:27:56,331 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 4950, loss[loss=0.1006, beats_loss=0.01032, ecapa_loss=0.0001879, whisper_loss=0.08837, over 21778.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.011, ecapa_loss=0.0001728, whisper_loss=0.09155, over 3913792.28 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:28:12,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1788680.0, ans=0.0 2024-08-12 19:28:13,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1788680.0, ans=15.0 2024-08-12 19:28:16,633 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 19:28:18,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1788680.0, ans=0.125 2024-08-12 19:28:27,692 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 19:28:35,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.39 vs. limit=10.0 2024-08-12 19:28:36,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1788780.0, ans=15.0 2024-08-12 19:28:44,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1788880.0, ans=0.1 2024-08-12 19:29:01,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-08-12 19:29:03,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1788980.0, ans=0.2 2024-08-12 19:29:07,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2024-08-12 19:29:09,629 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 19:29:14,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1789080.0, ans=0.04949747468305833 2024-08-12 19:29:15,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5000, loss[loss=0.1027, beats_loss=0.01001, ecapa_loss=0.0002066, whisper_loss=0.09061, over 19030.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01101, ecapa_loss=0.0001744, whisper_loss=0.09109, over 3902760.13 frames. ], batch size: 80, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:29:21,799 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 19:29:38,897 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:29:41,899 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 19:29:47,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1789180.0, ans=0.125 2024-08-12 19:29:47,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.488e+01 2.839e+01 3.204e+01 5.431e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 19:29:49,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1789280.0, ans=0.0 2024-08-12 19:29:52,614 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 19:29:56,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1789280.0, ans=0.125 2024-08-12 19:30:02,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-12 19:30:02,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1789280.0, ans=0.125 2024-08-12 19:30:17,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1789380.0, ans=10.0 2024-08-12 19:30:20,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1789380.0, ans=0.125 2024-08-12 19:30:22,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-12 19:30:27,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=12.0 2024-08-12 19:30:28,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-12 19:30:38,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5050, loss[loss=0.1143, beats_loss=0.01032, ecapa_loss=0.0001679, whisper_loss=0.1023, over 23103.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01107, ecapa_loss=0.0001734, whisper_loss=0.09161, over 3924676.03 frames. ], batch size: 93, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:30:40,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1789580.0, ans=0.125 2024-08-12 19:30:44,914 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 19:30:59,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1789680.0, ans=0.1 2024-08-12 19:31:01,321 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 19:31:20,657 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 19:31:26,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1789880.0, ans=0.2 2024-08-12 19:31:37,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.15 vs. limit=10.0 2024-08-12 19:31:40,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1789980.0, ans=0.0 2024-08-12 19:31:43,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-08-12 19:31:54,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5100, loss[loss=0.1133, beats_loss=0.01032, ecapa_loss=0.0001263, whisper_loss=0.1018, over 20010.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001717, whisper_loss=0.09187, over 3931898.58 frames. ], batch size: 76, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:31:59,817 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 19:32:03,428 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 19:32:19,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-08-12 19:32:20,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.471e+01 2.779e+01 3.135e+01 9.153e+01, threshold=5.559e+01, percent-clipped=1.0 2024-08-12 19:32:23,105 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 19:32:36,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1790380.0, ans=0.125 2024-08-12 19:32:38,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1790380.0, ans=0.2 2024-08-12 19:32:45,393 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 19:32:56,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.91 vs. limit=15.0 2024-08-12 19:33:02,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5150, loss[loss=0.104, beats_loss=0.01298, ecapa_loss=0.0001774, whisper_loss=0.08926, over 22986.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01106, ecapa_loss=0.0001711, whisper_loss=0.09145, over 3891977.51 frames. ], batch size: 93, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:33:26,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-12 19:33:29,998 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 19:33:55,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1790980.0, ans=0.1 2024-08-12 19:33:58,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-12 19:34:09,316 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 19:34:10,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5200, loss[loss=0.092, beats_loss=0.01284, ecapa_loss=0.0001605, whisper_loss=0.07756, over 21686.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001699, whisper_loss=0.09213, over 3921511.87 frames. ], batch size: 91, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:34:15,107 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.340e-01 2024-08-12 19:34:17,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1791080.0, ans=0.0 2024-08-12 19:34:19,868 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 19:34:20,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1791080.0, ans=0.125 2024-08-12 19:34:36,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.499e+01 2.713e+01 3.001e+01 1.517e+02, threshold=5.426e+01, percent-clipped=1.0 2024-08-12 19:34:43,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1791280.0, ans=0.125 2024-08-12 19:34:56,438 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 19:34:56,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1791380.0, ans=0.125 2024-08-12 19:35:04,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1791480.0, ans=0.1 2024-08-12 19:35:06,839 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 19:35:07,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1791480.0, ans=0.125 2024-08-12 19:35:11,338 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 19:35:19,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5250, loss[loss=0.1165, beats_loss=0.01165, ecapa_loss=0.0001751, whisper_loss=0.1031, over 22418.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.0001709, whisper_loss=0.09235, over 3893934.41 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:35:27,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-12 19:35:28,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1791580.0, ans=0.1 2024-08-12 19:35:38,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-12 19:35:47,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1791780.0, ans=0.125 2024-08-12 19:36:26,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-12 19:36:28,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5300, loss[loss=0.1233, beats_loss=0.01049, ecapa_loss=0.0001373, whisper_loss=0.1115, over 24842.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001724, whisper_loss=0.09239, over 3907576.74 frames. ], batch size: 94, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:36:54,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.416e+01 2.797e+01 3.236e+01 7.041e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 19:36:58,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1792280.0, ans=0.2 2024-08-12 19:36:58,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1792280.0, ans=0.0 2024-08-12 19:37:15,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2024-08-12 19:37:17,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=15.0 2024-08-12 19:37:18,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1792380.0, ans=0.0 2024-08-12 19:37:35,833 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5350, loss[loss=0.0932, beats_loss=0.01132, ecapa_loss=0.0001563, whisper_loss=0.08032, over 19733.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01101, ecapa_loss=0.0001705, whisper_loss=0.09111, over 3898676.03 frames. ], batch size: 75, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:37:37,350 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 19:37:47,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1792580.0, ans=0.0 2024-08-12 19:37:59,573 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 29 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 19:37:59,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1792680.0, ans=0.125 2024-08-12 19:38:28,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-12 19:38:28,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-12 19:38:37,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1792980.0, ans=0.0 2024-08-12 19:38:44,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5400, loss[loss=0.09882, beats_loss=0.009173, ecapa_loss=0.0002126, whisper_loss=0.08752, over 15523.00 frames. ], tot_loss[loss=0.104, beats_loss=0.011, ecapa_loss=0.00017, whisper_loss=0.09128, over 3890482.11 frames. ], batch size: 64, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:38:51,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1793080.0, ans=0.1 2024-08-12 19:39:09,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.477e+01 2.760e+01 3.199e+01 8.149e+01, threshold=5.520e+01, percent-clipped=2.0 2024-08-12 19:39:13,247 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 19:39:32,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-12 19:39:53,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5450, loss[loss=0.114, beats_loss=0.009304, ecapa_loss=0.0002021, whisper_loss=0.1027, over 20632.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001714, whisper_loss=0.09196, over 3898673.49 frames. ], batch size: 83, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:39:56,594 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 19:40:07,036 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:40:23,322 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-12 19:40:34,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1793780.0, ans=0.125 2024-08-12 19:40:55,251 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-12 19:41:02,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1793980.0, ans=0.0 2024-08-12 19:41:06,638 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5500, loss[loss=0.122, beats_loss=0.01201, ecapa_loss=0.0001282, whisper_loss=0.1087, over 22564.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001705, whisper_loss=0.09161, over 3910486.65 frames. ], batch size: 86, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:41:32,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.427e+01 2.827e+01 3.059e+01 4.853e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 19:41:36,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1794280.0, ans=0.125 2024-08-12 19:41:41,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1794280.0, ans=0.125 2024-08-12 19:41:47,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1794280.0, ans=0.125 2024-08-12 19:42:24,260 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5550, loss[loss=0.09507, beats_loss=0.009954, ecapa_loss=0.0002179, whisper_loss=0.08294, over 17935.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01103, ecapa_loss=0.0001705, whisper_loss=0.09064, over 3908057.74 frames. ], batch size: 77, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:42:51,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1794680.0, ans=0.1 2024-08-12 19:42:51,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1794680.0, ans=10.0 2024-08-12 19:42:52,883 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 19:43:04,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1794780.0, ans=0.2 2024-08-12 19:43:34,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-08-12 19:43:43,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1794980.0, ans=0.0 2024-08-12 19:43:44,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1794980.0, ans=0.2 2024-08-12 19:43:47,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1794980.0, ans=0.0 2024-08-12 19:43:49,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5600, loss[loss=0.07178, beats_loss=0.01289, ecapa_loss=0.0001616, whisper_loss=0.05727, over 14449.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01109, ecapa_loss=0.0001705, whisper_loss=0.09055, over 3915989.99 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:43:51,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1795080.0, ans=0.0 2024-08-12 19:44:10,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1795180.0, ans=0.2 2024-08-12 19:44:24,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.502e+01 2.768e+01 3.142e+01 4.658e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 19:44:38,259 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-12 19:44:44,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1795380.0, ans=0.0 2024-08-12 19:45:22,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5650, loss[loss=0.112, beats_loss=0.01017, ecapa_loss=0.0001535, whisper_loss=0.1003, over 20765.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0111, ecapa_loss=0.0001701, whisper_loss=0.09105, over 3938244.39 frames. ], batch size: 82, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:45:29,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1795580.0, ans=0.125 2024-08-12 19:45:35,289 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 19:45:36,482 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-12 19:45:41,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-08-12 19:45:57,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1795680.0, ans=0.0 2024-08-12 19:45:57,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.20 vs. limit=10.0 2024-08-12 19:46:12,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1795780.0, ans=0.09899494936611666 2024-08-12 19:46:36,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1795980.0, ans=0.2 2024-08-12 19:46:46,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1795980.0, ans=0.0 2024-08-12 19:46:48,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1795980.0, ans=0.0 2024-08-12 19:46:56,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1796080.0, ans=0.0 2024-08-12 19:46:56,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5700, loss[loss=0.07262, beats_loss=0.01096, ecapa_loss=0.0001885, whisper_loss=0.05978, over 14372.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01103, ecapa_loss=0.0001712, whisper_loss=0.09136, over 3930034.35 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:47:07,788 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 19:47:27,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1796180.0, ans=0.0 2024-08-12 19:47:33,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.533e+01 2.876e+01 3.216e+01 4.377e+01, threshold=5.753e+01, percent-clipped=0.0 2024-08-12 19:47:58,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.72 vs. limit=22.5 2024-08-12 19:48:08,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1796380.0, ans=0.1 2024-08-12 19:48:10,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=12.0 2024-08-12 19:48:21,119 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 19:48:30,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5750, loss[loss=0.1053, beats_loss=0.008247, ecapa_loss=0.0001778, whisper_loss=0.0953, over 15428.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01093, ecapa_loss=0.0001728, whisper_loss=0.09225, over 3931997.93 frames. ], batch size: 61, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:48:31,059 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 19:48:34,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1796580.0, ans=0.125 2024-08-12 19:48:36,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1796580.0, ans=0.125 2024-08-12 19:48:37,381 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 19:49:04,857 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 19:49:08,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1796680.0, ans=0.0 2024-08-12 19:49:09,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1796780.0, ans=0.1 2024-08-12 19:49:36,892 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 19:49:36,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1796880.0, ans=0.125 2024-08-12 19:49:39,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-12 19:50:00,469 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 19:50:01,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5800, loss[loss=0.08278, beats_loss=0.01404, ecapa_loss=0.0001413, whisper_loss=0.06732, over 20084.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.000173, whisper_loss=0.09159, over 3917468.98 frames. ], batch size: 84, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:50:05,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1797080.0, ans=0.125 2024-08-12 19:50:11,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1797080.0, ans=0.125 2024-08-12 19:50:27,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.440e+01 2.724e+01 3.167e+01 6.575e+01, threshold=5.447e+01, percent-clipped=2.0 2024-08-12 19:50:30,900 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 19:50:38,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1797280.0, ans=0.0 2024-08-12 19:50:50,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1797380.0, ans=0.125 2024-08-12 19:50:57,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1797380.0, ans=0.125 2024-08-12 19:51:06,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-08-12 19:51:07,525 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-12 19:51:10,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-12 19:51:11,578 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 19:51:14,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5850, loss[loss=0.1149, beats_loss=0.01051, ecapa_loss=0.0001719, whisper_loss=0.1026, over 18067.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001731, whisper_loss=0.09142, over 3910027.98 frames. ], batch size: 72, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:51:21,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1797580.0, ans=0.0 2024-08-12 19:51:28,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1797680.0, ans=0.0 2024-08-12 19:51:31,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1797680.0, ans=0.125 2024-08-12 19:51:37,696 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 19:51:40,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-08-12 19:51:42,066 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 19:51:43,684 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 19:51:55,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-08-12 19:51:56,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1797880.0, ans=0.125 2024-08-12 19:52:22,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1797980.0, ans=0.1 2024-08-12 19:52:24,119 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 19:52:26,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5900, loss[loss=0.09349, beats_loss=0.01239, ecapa_loss=0.0001315, whisper_loss=0.07979, over 14555.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001722, whisper_loss=0.09128, over 3881642.09 frames. ], batch size: 54, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:52:30,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-12 19:52:54,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.654e+01 2.967e+01 3.336e+01 4.788e+01, threshold=5.934e+01, percent-clipped=0.0 2024-08-12 19:53:02,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1798280.0, ans=0.125 2024-08-12 19:53:06,677 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 19:53:09,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1798380.0, ans=0.125 2024-08-12 19:53:19,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1798380.0, ans=0.0 2024-08-12 19:53:34,429 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 19:53:38,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 5950, loss[loss=0.09083, beats_loss=0.01056, ecapa_loss=0.0001424, whisper_loss=0.07885, over 14352.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01094, ecapa_loss=0.0001728, whisper_loss=0.09148, over 3866611.62 frames. ], batch size: 54, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:53:45,505 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 19:53:51,711 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 19:53:52,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.90 vs. limit=22.5 2024-08-12 19:54:01,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1798680.0, ans=0.125 2024-08-12 19:54:22,415 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 19:54:32,784 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-12 19:54:49,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1798980.0, ans=0.0 2024-08-12 19:54:52,646 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 19:54:52,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1798980.0, ans=0.0 2024-08-12 19:54:55,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6000, loss[loss=0.1147, beats_loss=0.01251, ecapa_loss=0.0001849, whisper_loss=0.1004, over 21718.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001718, whisper_loss=0.09159, over 3901888.27 frames. ], batch size: 88, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:54:55,083 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 19:55:26,057 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8658, 2.5676, 3.1940, 3.3945], device='cuda:3') 2024-08-12 19:55:33,580 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005899, whisper_loss=0.2486, over 922467.00 frames. 2024-08-12 19:55:50,073 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on SV_voxceleb1: loss=0.004696, beats_loss=0, ecapa_loss=0.0004696, whisper_loss=0, over 939242.00 frames. 2024-08-12 19:56:56,517 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8373, 1.7079, 2.0419, 1.1820], device='cuda:3') 2024-08-12 19:57:46,554 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on AT_audioset: loss=0.02428, beats_loss=0.02428, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 19:57:46,558 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 19:57:48,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1799080.0, ans=0.0 2024-08-12 19:57:51,134 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:57:59,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1799180.0, ans=0.2 2024-08-12 19:58:01,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1799180.0, ans=0.1 2024-08-12 19:58:10,306 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 19:58:15,385 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.491e-01 2024-08-12 19:58:16,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.501e+01 2.791e+01 3.141e+01 5.827e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-12 19:58:38,678 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.022e-01 2024-08-12 19:58:44,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1799380.0, ans=0.025 2024-08-12 19:58:55,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1799480.0, ans=0.125 2024-08-12 19:59:01,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1799480.0, ans=0.2 2024-08-12 19:59:04,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6050, loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001908, whisper_loss=0.08998, over 22652.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01107, ecapa_loss=0.0001709, whisper_loss=0.09151, over 3907748.07 frames. ], batch size: 92, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:59:05,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1799580.0, ans=0.125 2024-08-12 19:59:09,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-08-12 19:59:17,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1799580.0, ans=0.125 2024-08-12 19:59:23,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1799680.0, ans=0.0 2024-08-12 19:59:32,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-08-12 19:59:37,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1799780.0, ans=0.125 2024-08-12 19:59:41,749 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 19:59:42,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=15.0 2024-08-12 19:59:43,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1799780.0, ans=0.125 2024-08-12 19:59:49,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1799880.0, ans=0.125 2024-08-12 19:59:56,097 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 20:00:00,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1799880.0, ans=0.125 2024-08-12 20:00:03,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2024-08-12 20:00:08,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1799980.0, ans=0.025 2024-08-12 20:00:21,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.31 vs. limit=10.0 2024-08-12 20:00:24,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6100, loss[loss=0.07348, beats_loss=0.01389, ecapa_loss=0.0001895, whisper_loss=0.0577, over 21753.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01103, ecapa_loss=0.0001726, whisper_loss=0.09165, over 3929524.48 frames. ], batch size: 93, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:00:52,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1800180.0, ans=0.125 2024-08-12 20:00:55,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.407e+01 2.685e+01 3.141e+01 4.380e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 20:00:57,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1800280.0, ans=0.125 2024-08-12 20:00:58,877 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:01:39,418 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 20:01:42,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6150, loss[loss=0.135, beats_loss=0.00922, ecapa_loss=0.0001424, whisper_loss=0.1243, over 22148.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001722, whisper_loss=0.09176, over 3919845.97 frames. ], batch size: 80, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:01:45,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1800580.0, ans=0.125 2024-08-12 20:01:58,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-12 20:02:58,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6200, loss[loss=0.1167, beats_loss=0.01036, ecapa_loss=0.0001317, whisper_loss=0.1051, over 20408.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01099, ecapa_loss=0.0001719, whisper_loss=0.09157, over 3906990.25 frames. ], batch size: 75, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:03:01,520 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 20:03:01,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1801080.0, ans=0.2 2024-08-12 20:03:15,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-08-12 20:03:24,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1801180.0, ans=0.0 2024-08-12 20:03:27,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.462e+01 2.878e+01 3.273e+01 2.094e+02, threshold=5.757e+01, percent-clipped=3.0 2024-08-12 20:03:48,230 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 20:03:58,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1801480.0, ans=0.125 2024-08-12 20:04:12,764 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-12 20:04:13,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6250, loss[loss=0.08779, beats_loss=0.0144, ecapa_loss=0.0001397, whisper_loss=0.07199, over 22067.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.000172, whisper_loss=0.09192, over 3908042.02 frames. ], batch size: 93, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:04:23,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-12 20:04:27,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1801680.0, ans=0.125 2024-08-12 20:04:39,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1801680.0, ans=0.125 2024-08-12 20:04:56,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1801880.0, ans=10.0 2024-08-12 20:05:06,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.21 vs. limit=22.5 2024-08-12 20:05:14,027 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 20:05:14,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1801980.0, ans=0.125 2024-08-12 20:05:28,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6300, loss[loss=0.09656, beats_loss=0.007911, ecapa_loss=0.0002097, whisper_loss=0.08655, over 16893.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.011, ecapa_loss=0.0001716, whisper_loss=0.09118, over 3897403.55 frames. ], batch size: 69, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:05:39,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-08-12 20:05:54,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1802180.0, ans=0.2 2024-08-12 20:05:57,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.436e+01 2.696e+01 3.138e+01 5.310e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-12 20:06:03,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1802280.0, ans=0.0 2024-08-12 20:06:05,218 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 20:06:15,029 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 20:06:43,443 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6350, loss[loss=0.09821, beats_loss=0.01115, ecapa_loss=0.000159, whisper_loss=0.08547, over 16653.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001729, whisper_loss=0.09142, over 3871570.05 frames. ], batch size: 65, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:06:43,637 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 20:06:43,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1802580.0, ans=0.2 2024-08-12 20:07:10,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1802680.0, ans=0.0 2024-08-12 20:07:25,317 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:07:34,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1802880.0, ans=0.05 2024-08-12 20:07:38,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1802880.0, ans=0.125 2024-08-12 20:07:40,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1802880.0, ans=0.125 2024-08-12 20:07:53,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1802980.0, ans=0.0 2024-08-12 20:07:57,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6400, loss[loss=0.09469, beats_loss=0.01226, ecapa_loss=0.0002236, whisper_loss=0.0802, over 17640.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.0001718, whisper_loss=0.09206, over 3892966.16 frames. ], batch size: 75, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:08:16,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1803180.0, ans=0.09899494936611666 2024-08-12 20:08:24,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.558e+01 2.846e+01 3.413e+01 1.173e+02, threshold=5.692e+01, percent-clipped=2.0 2024-08-12 20:08:25,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1803280.0, ans=0.05 2024-08-12 20:08:28,133 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 20:08:28,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1803280.0, ans=0.1 2024-08-12 20:08:31,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1803280.0, ans=0.0 2024-08-12 20:08:47,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1803380.0, ans=0.2 2024-08-12 20:09:02,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1803480.0, ans=0.0 2024-08-12 20:09:08,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6450, loss[loss=0.1135, beats_loss=0.01049, ecapa_loss=0.0001934, whisper_loss=0.1011, over 18443.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001704, whisper_loss=0.092, over 3897363.99 frames. ], batch size: 74, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:09:09,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=22.5 2024-08-12 20:09:12,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1803580.0, ans=0.025 2024-08-12 20:09:22,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1803680.0, ans=0.0 2024-08-12 20:09:22,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1803680.0, ans=0.0 2024-08-12 20:09:48,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1803780.0, ans=0.125 2024-08-12 20:09:56,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1803880.0, ans=0.125 2024-08-12 20:10:20,052 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6500, loss[loss=0.09526, beats_loss=0.012, ecapa_loss=0.0001585, whisper_loss=0.08168, over 16667.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001698, whisper_loss=0.09262, over 3906796.75 frames. ], batch size: 66, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:10:23,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1804080.0, ans=0.125 2024-08-12 20:10:35,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1804180.0, ans=0.0 2024-08-12 20:10:38,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1804180.0, ans=0.0 2024-08-12 20:10:40,605 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 20:10:48,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.417e+01 2.617e+01 2.819e+01 4.970e+01, threshold=5.233e+01, percent-clipped=0.0 2024-08-12 20:10:56,321 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-12 20:11:19,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-12 20:11:19,875 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 20:11:30,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6550, loss[loss=0.08232, beats_loss=0.0111, ecapa_loss=0.0001821, whisper_loss=0.0694, over 13236.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01091, ecapa_loss=0.0001708, whisper_loss=0.09254, over 3913277.35 frames. ], batch size: 54, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:11:48,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1804680.0, ans=0.125 2024-08-12 20:11:48,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1804680.0, ans=0.1 2024-08-12 20:12:09,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1804780.0, ans=0.0 2024-08-12 20:12:12,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1804880.0, ans=0.1 2024-08-12 20:12:18,109 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 20:12:20,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1804880.0, ans=10.0 2024-08-12 20:12:27,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1804980.0, ans=0.125 2024-08-12 20:12:34,216 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 20:12:39,692 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6600, loss[loss=0.104, beats_loss=0.01243, ecapa_loss=0.0001601, whisper_loss=0.08998, over 17461.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01088, ecapa_loss=0.0001716, whisper_loss=0.09311, over 3970775.21 frames. ], batch size: 69, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:12:40,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2024-08-12 20:12:53,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=1805180.0, ans=12.0 2024-08-12 20:12:54,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1805180.0, ans=0.125 2024-08-12 20:12:58,772 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 20:13:03,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1805180.0, ans=0.125 2024-08-12 20:13:06,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-12 20:13:06,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.480e+01 2.766e+01 3.110e+01 5.063e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 20:13:08,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-12 20:13:23,383 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 20:13:34,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1805480.0, ans=0.0 2024-08-12 20:13:38,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1805480.0, ans=0.1 2024-08-12 20:13:47,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6650, loss[loss=0.09513, beats_loss=0.01434, ecapa_loss=0.0001716, whisper_loss=0.07908, over 17752.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01082, ecapa_loss=0.0001727, whisper_loss=0.09353, over 3974625.27 frames. ], batch size: 75, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:13:50,614 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 20:13:56,026 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 20:14:00,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2024-08-12 20:14:04,371 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 20:14:11,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1805680.0, ans=0.1 2024-08-12 20:14:27,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2024-08-12 20:14:29,733 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.005e+01 2024-08-12 20:14:33,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-08-12 20:14:44,301 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 20:14:47,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1805980.0, ans=0.0 2024-08-12 20:14:50,960 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 20:14:56,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6700, loss[loss=0.1107, beats_loss=0.01253, ecapa_loss=0.0001597, whisper_loss=0.0966, over 20749.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0109, ecapa_loss=0.0001719, whisper_loss=0.09322, over 3958258.38 frames. ], batch size: 81, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:14:58,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1806080.0, ans=0.04949747468305833 2024-08-12 20:15:05,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-08-12 20:15:16,650 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 20:15:18,277 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 20:15:19,547 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 20:15:23,564 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.568e+01 2.820e+01 3.306e+01 6.884e+01, threshold=5.641e+01, percent-clipped=3.0 2024-08-12 20:15:30,920 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 20:15:38,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1806380.0, ans=0.125 2024-08-12 20:15:39,267 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 20:15:42,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1806380.0, ans=0.0 2024-08-12 20:15:49,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1806380.0, ans=0.1 2024-08-12 20:15:55,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1806480.0, ans=0.1 2024-08-12 20:16:04,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1806580.0, ans=0.035 2024-08-12 20:16:05,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6750, loss[loss=0.1073, beats_loss=0.01069, ecapa_loss=0.0001687, whisper_loss=0.09494, over 14341.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01088, ecapa_loss=0.000173, whisper_loss=0.09311, over 3907697.75 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:16:14,225 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 20:16:27,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1806680.0, ans=0.125 2024-08-12 20:16:40,873 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 20:16:53,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=1806880.0, ans=15.0 2024-08-12 20:16:55,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1806880.0, ans=0.125 2024-08-12 20:16:56,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.70 vs. limit=5.0 2024-08-12 20:17:07,361 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 20:17:15,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6800, loss[loss=0.08681, beats_loss=0.01161, ecapa_loss=0.0001777, whisper_loss=0.07342, over 19847.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01082, ecapa_loss=0.0001732, whisper_loss=0.09332, over 3891184.20 frames. ], batch size: 81, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:17:24,186 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.333e+01 2024-08-12 20:17:43,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.433e+01 2.678e+01 3.224e+01 5.136e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-12 20:17:43,318 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 20:18:18,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1807480.0, ans=0.125 2024-08-12 20:18:23,613 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 20:18:24,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6850, loss[loss=0.08603, beats_loss=0.01225, ecapa_loss=0.0001863, whisper_loss=0.07192, over 14926.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01092, ecapa_loss=0.0001723, whisper_loss=0.09288, over 3886452.39 frames. ], batch size: 61, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:18:26,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1807580.0, ans=0.0 2024-08-12 20:18:28,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1807580.0, ans=0.1 2024-08-12 20:18:45,176 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-12 20:18:46,262 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 20:18:46,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2024-08-12 20:19:24,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1807980.0, ans=0.125 2024-08-12 20:19:26,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-12 20:19:32,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1808080.0, ans=0.125 2024-08-12 20:19:33,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6900, loss[loss=0.1051, beats_loss=0.01004, ecapa_loss=0.0001572, whisper_loss=0.09347, over 21996.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01101, ecapa_loss=0.0001712, whisper_loss=0.09275, over 3909009.40 frames. ], batch size: 85, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:19:51,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1808180.0, ans=0.125 2024-08-12 20:19:57,477 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 20:20:01,725 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.402e+01 2.709e+01 3.139e+01 1.091e+02, threshold=5.419e+01, percent-clipped=1.0 2024-08-12 20:20:10,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1808280.0, ans=0.1 2024-08-12 20:20:13,989 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 20:20:22,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1808380.0, ans=0.1 2024-08-12 20:20:32,562 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 20:20:41,712 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 6950, loss[loss=0.1178, beats_loss=0.01059, ecapa_loss=0.0001878, whisper_loss=0.1054, over 21686.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01109, ecapa_loss=0.0001713, whisper_loss=0.09204, over 3887450.29 frames. ], batch size: 89, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:20:47,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=22.5 2024-08-12 20:20:49,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1808580.0, ans=0.125 2024-08-12 20:20:58,981 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 20:20:59,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1808680.0, ans=0.125 2024-08-12 20:21:06,347 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-12 20:21:10,326 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 20:21:35,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1808880.0, ans=0.125 2024-08-12 20:21:37,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1808980.0, ans=0.02 2024-08-12 20:21:52,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7000, loss[loss=0.09472, beats_loss=0.01157, ecapa_loss=0.0001531, whisper_loss=0.08162, over 14796.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01102, ecapa_loss=0.0001722, whisper_loss=0.09191, over 3854312.36 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:21:53,891 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 20:21:59,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1809080.0, ans=0.125 2024-08-12 20:22:00,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1809080.0, ans=0.125 2024-08-12 20:22:07,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1809180.0, ans=0.1 2024-08-12 20:22:10,387 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 20:22:13,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1809180.0, ans=0.2 2024-08-12 20:22:19,774 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.381e+01 2.667e+01 3.091e+01 4.298e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-12 20:22:22,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1809280.0, ans=0.2 2024-08-12 20:22:33,746 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 20:22:40,980 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 20:22:56,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1809480.0, ans=0.0 2024-08-12 20:23:01,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7050, loss[loss=0.08865, beats_loss=0.0118, ecapa_loss=0.0001653, whisper_loss=0.0752, over 16969.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01105, ecapa_loss=0.0001724, whisper_loss=0.09099, over 3880901.06 frames. ], batch size: 67, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:23:02,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1809580.0, ans=0.0 2024-08-12 20:23:05,645 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 20:23:07,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1809580.0, ans=0.125 2024-08-12 20:23:16,650 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 20:23:34,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1809780.0, ans=0.0 2024-08-12 20:23:41,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1809880.0, ans=0.1 2024-08-12 20:23:53,015 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 33 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 20:23:53,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=12.0 2024-08-12 20:23:54,734 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:23:59,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.52 vs. limit=22.5 2024-08-12 20:24:02,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2024-08-12 20:24:03,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1809980.0, ans=0.0 2024-08-12 20:24:10,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7100, loss[loss=0.1119, beats_loss=0.01284, ecapa_loss=0.0001233, whisper_loss=0.0978, over 22055.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.0001708, whisper_loss=0.09183, over 3880242.60 frames. ], batch size: 83, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:24:19,592 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 20:24:29,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1810180.0, ans=0.2 2024-08-12 20:24:37,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1810280.0, ans=0.125 2024-08-12 20:24:38,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.554e+01 2.752e+01 3.133e+01 4.741e+01, threshold=5.504e+01, percent-clipped=0.0 2024-08-12 20:24:51,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1810380.0, ans=0.2 2024-08-12 20:25:04,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1810480.0, ans=0.125 2024-08-12 20:25:19,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7150, loss[loss=0.136, beats_loss=0.009487, ecapa_loss=0.0001627, whisper_loss=0.1249, over 20874.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001708, whisper_loss=0.09266, over 3909226.83 frames. ], batch size: 78, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:25:32,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1810680.0, ans=0.2 2024-08-12 20:25:32,998 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 20:25:33,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1810680.0, ans=0.125 2024-08-12 20:25:44,308 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 20:25:47,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.51 vs. limit=22.5 2024-08-12 20:25:54,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1810780.0, ans=0.0 2024-08-12 20:25:57,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1810780.0, ans=10.0 2024-08-12 20:25:59,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2024-08-12 20:25:59,970 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 20:26:13,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1810980.0, ans=0.0 2024-08-12 20:26:28,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-12 20:26:28,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7200, loss[loss=0.1204, beats_loss=0.008755, ecapa_loss=0.0001654, whisper_loss=0.11, over 16919.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01091, ecapa_loss=0.0001719, whisper_loss=0.09316, over 3921713.09 frames. ], batch size: 66, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:26:47,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-12 20:26:49,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1811180.0, ans=0.125 2024-08-12 20:26:55,635 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.470e+01 2.758e+01 3.060e+01 4.587e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-12 20:27:05,733 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 20:27:06,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1811280.0, ans=0.2 2024-08-12 20:27:14,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1811380.0, ans=0.2 2024-08-12 20:27:21,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1811380.0, ans=0.0 2024-08-12 20:27:23,677 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 20:27:32,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1811480.0, ans=0.125 2024-08-12 20:27:37,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7250, loss[loss=0.1082, beats_loss=0.00955, ecapa_loss=0.0001762, whisper_loss=0.0969, over 20906.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01084, ecapa_loss=0.000173, whisper_loss=0.09323, over 3912992.23 frames. ], batch size: 82, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:27:43,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1811580.0, ans=0.0 2024-08-12 20:27:44,298 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 20:28:16,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-12 20:28:20,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1811880.0, ans=0.125 2024-08-12 20:28:23,993 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 20:28:26,793 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 20:28:28,070 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 20:28:32,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1811980.0, ans=0.2 2024-08-12 20:28:33,815 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 20:28:35,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1811980.0, ans=0.2 2024-08-12 20:28:47,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7300, loss[loss=0.1149, beats_loss=0.01242, ecapa_loss=0.0001515, whisper_loss=0.101, over 21494.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0109, ecapa_loss=0.0001734, whisper_loss=0.0931, over 3935070.52 frames. ], batch size: 83, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:28:50,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1812080.0, ans=0.125 2024-08-12 20:29:00,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1812180.0, ans=0.0 2024-08-12 20:29:14,960 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.458e+01 2.787e+01 3.037e+01 3.790e+01, threshold=5.575e+01, percent-clipped=0.0 2024-08-12 20:29:19,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1812280.0, ans=0.125 2024-08-12 20:29:20,959 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 13 from LS+wenet, 18 from Vox, 53 fro AS 2024-08-12 20:29:22,293 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-12 20:29:25,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1812280.0, ans=0.125 2024-08-12 20:29:29,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1812380.0, ans=0.0 2024-08-12 20:29:40,056 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 16 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 20:29:56,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7350, loss[loss=0.09415, beats_loss=0.008826, ecapa_loss=0.0001522, whisper_loss=0.0838, over 14921.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.011, ecapa_loss=0.0001718, whisper_loss=0.09151, over 3912418.32 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:29:58,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1812580.0, ans=0.0 2024-08-12 20:29:58,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.67 vs. limit=15.0 2024-08-12 20:30:03,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1812580.0, ans=0.0 2024-08-12 20:30:18,792 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 20:30:19,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1812680.0, ans=0.0 2024-08-12 20:30:32,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2024-08-12 20:30:34,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1812780.0, ans=0.125 2024-08-12 20:30:42,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1812880.0, ans=0.125 2024-08-12 20:30:47,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-12 20:31:00,058 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:31:04,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1813080.0, ans=0.0 2024-08-12 20:31:04,895 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7400, loss[loss=0.09343, beats_loss=0.01053, ecapa_loss=0.0001769, whisper_loss=0.08113, over 19717.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01099, ecapa_loss=0.0001717, whisper_loss=0.09091, over 3899503.24 frames. ], batch size: 82, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:31:32,384 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.493e+01 2.726e+01 3.079e+01 4.243e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 20:31:35,522 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 20:31:38,227 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 20:31:44,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-08-12 20:31:52,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1813380.0, ans=0.125 2024-08-12 20:32:12,411 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 20:32:13,714 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7450, loss[loss=0.123, beats_loss=0.009922, ecapa_loss=0.0001861, whisper_loss=0.1112, over 22466.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01095, ecapa_loss=0.0001712, whisper_loss=0.09152, over 3926470.36 frames. ], batch size: 88, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:32:14,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1813580.0, ans=0.125 2024-08-12 20:32:29,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1813680.0, ans=0.1 2024-08-12 20:32:31,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1813680.0, ans=0.125 2024-08-12 20:32:37,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-12 20:32:40,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-08-12 20:32:41,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1813780.0, ans=0.125 2024-08-12 20:33:04,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1813880.0, ans=0.125 2024-08-12 20:33:08,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1813980.0, ans=0.1 2024-08-12 20:33:11,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2024-08-12 20:33:16,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1813980.0, ans=0.0 2024-08-12 20:33:19,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1813980.0, ans=0.125 2024-08-12 20:33:21,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7500, loss[loss=0.08705, beats_loss=0.01047, ecapa_loss=0.000173, whisper_loss=0.07485, over 19034.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.0001713, whisper_loss=0.09184, over 3942904.22 frames. ], batch size: 78, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:33:48,379 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 20:33:49,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.399e+01 2.676e+01 3.018e+01 5.657e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-12 20:34:02,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1814380.0, ans=0.1 2024-08-12 20:34:16,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2024-08-12 20:34:18,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1814480.0, ans=10.0 2024-08-12 20:34:22,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-12 20:34:31,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7550, loss[loss=0.0923, beats_loss=0.01167, ecapa_loss=0.0001511, whisper_loss=0.07912, over 19761.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01093, ecapa_loss=0.0001699, whisper_loss=0.09187, over 3904682.30 frames. ], batch size: 81, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:34:35,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1814580.0, ans=0.125 2024-08-12 20:34:45,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1814680.0, ans=0.2 2024-08-12 20:34:45,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.71 vs. limit=10.0 2024-08-12 20:34:50,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1814680.0, ans=0.1 2024-08-12 20:34:55,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-12 20:35:05,711 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 20:35:08,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1814780.0, ans=0.125 2024-08-12 20:35:14,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-12 20:35:32,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1814980.0, ans=0.0 2024-08-12 20:35:34,034 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-12 20:35:40,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7600, loss[loss=0.1043, beats_loss=0.01121, ecapa_loss=0.0001933, whisper_loss=0.0912, over 19983.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01089, ecapa_loss=0.0001714, whisper_loss=0.0916, over 3877271.66 frames. ], batch size: 85, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:35:48,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1815080.0, ans=0.125 2024-08-12 20:35:49,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1815080.0, ans=0.125 2024-08-12 20:36:02,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1815180.0, ans=0.0 2024-08-12 20:36:08,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.568e+01 2.871e+01 3.338e+01 1.735e+02, threshold=5.742e+01, percent-clipped=2.0 2024-08-12 20:36:17,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1815280.0, ans=0.125 2024-08-12 20:36:29,856 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 31 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-12 20:36:35,503 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 20:36:42,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1815480.0, ans=0.125 2024-08-12 20:36:46,476 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 20:36:50,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7650, loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001755, whisper_loss=0.09321, over 22056.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.000171, whisper_loss=0.09152, over 3901873.13 frames. ], batch size: 88, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:36:55,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-12 20:37:04,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1815680.0, ans=0.125 2024-08-12 20:37:17,371 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 20:37:17,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1815780.0, ans=0.2 2024-08-12 20:37:34,051 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 20:37:38,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1815880.0, ans=0.125 2024-08-12 20:37:39,654 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 20:37:41,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1815880.0, ans=0.125 2024-08-12 20:37:49,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=12.0 2024-08-12 20:37:50,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1815980.0, ans=0.125 2024-08-12 20:37:55,906 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 20:37:59,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7700, loss[loss=0.08908, beats_loss=0.01301, ecapa_loss=0.0001472, whisper_loss=0.07459, over 22289.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.0001708, whisper_loss=0.09133, over 3918023.15 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:38:00,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1816080.0, ans=0.0 2024-08-12 20:38:21,030 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 20:38:22,766 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.840e-02 2024-08-12 20:38:27,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.538e+01 2.763e+01 3.264e+01 5.327e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 20:38:36,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1816280.0, ans=0.0 2024-08-12 20:38:38,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2024-08-12 20:38:56,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1816480.0, ans=0.1 2024-08-12 20:39:00,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-12 20:39:01,443 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 20:39:08,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7750, loss[loss=0.1092, beats_loss=0.0116, ecapa_loss=0.0001325, whisper_loss=0.09626, over 23305.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001703, whisper_loss=0.09106, over 3888522.62 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:39:12,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1816580.0, ans=0.125 2024-08-12 20:39:16,380 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 22 from LS+wenet, 7 from Vox, 25 fro AS 2024-08-12 20:39:20,456 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 20:39:32,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=1816680.0, ans=22.5 2024-08-12 20:39:34,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1816680.0, ans=0.125 2024-08-12 20:39:39,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-12 20:39:43,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1816780.0, ans=0.125 2024-08-12 20:39:53,807 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:39:55,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1816880.0, ans=0.1 2024-08-12 20:39:58,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1816880.0, ans=0.0 2024-08-12 20:40:05,054 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-12 20:40:07,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1816980.0, ans=0.125 2024-08-12 20:40:11,524 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 20:40:18,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7800, loss[loss=0.1172, beats_loss=0.01024, ecapa_loss=0.0001613, whisper_loss=0.1054, over 21358.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001707, whisper_loss=0.09152, over 3888840.68 frames. ], batch size: 80, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:40:23,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1817080.0, ans=0.125 2024-08-12 20:40:25,335 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 20:40:33,198 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 20:40:36,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1817180.0, ans=0.0 2024-08-12 20:40:36,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-12 20:40:42,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1817180.0, ans=0.035 2024-08-12 20:40:44,811 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 20:40:45,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.560e+01 2.836e+01 3.091e+01 4.411e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 20:40:59,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1817380.0, ans=0.95 2024-08-12 20:41:17,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1817480.0, ans=0.125 2024-08-12 20:41:27,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7850, loss[loss=0.106, beats_loss=0.009471, ecapa_loss=0.0001625, whisper_loss=0.09487, over 14367.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.0001715, whisper_loss=0.09152, over 3891775.28 frames. ], batch size: 54, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:41:40,814 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 20:41:48,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1817680.0, ans=0.0 2024-08-12 20:41:49,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1817680.0, ans=0.125 2024-08-12 20:41:51,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1817680.0, ans=0.05 2024-08-12 20:41:53,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1817780.0, ans=0.1 2024-08-12 20:42:04,728 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 20:42:25,515 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 20:42:36,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7900, loss[loss=0.09909, beats_loss=0.01286, ecapa_loss=0.0001585, whisper_loss=0.08464, over 22217.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01101, ecapa_loss=0.0001718, whisper_loss=0.09217, over 3936238.91 frames. ], batch size: 88, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:43:04,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.497e+01 2.722e+01 3.152e+01 4.641e+01, threshold=5.444e+01, percent-clipped=0.0 2024-08-12 20:43:18,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1818380.0, ans=0.0 2024-08-12 20:43:26,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1818380.0, ans=0.0 2024-08-12 20:43:37,390 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 20:43:39,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1818480.0, ans=0.125 2024-08-12 20:43:45,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 7950, loss[loss=0.08232, beats_loss=0.01256, ecapa_loss=0.0001732, whisper_loss=0.06803, over 16579.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01102, ecapa_loss=0.0001709, whisper_loss=0.09164, over 3931314.30 frames. ], batch size: 69, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:43:49,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2024-08-12 20:43:56,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2024-08-12 20:43:57,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1818580.0, ans=0.0 2024-08-12 20:43:58,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1818680.0, ans=0.1 2024-08-12 20:44:06,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1818680.0, ans=0.125 2024-08-12 20:44:13,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=22.5 2024-08-12 20:44:19,995 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 20:44:25,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1818880.0, ans=0.2 2024-08-12 20:44:41,467 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 20:44:55,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8000, loss[loss=0.09424, beats_loss=0.01273, ecapa_loss=0.0001523, whisper_loss=0.07999, over 22485.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001705, whisper_loss=0.09147, over 3919976.02 frames. ], batch size: 91, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:45:22,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.456e+01 2.721e+01 3.092e+01 4.967e+01, threshold=5.442e+01, percent-clipped=0.0 2024-08-12 20:45:24,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1819280.0, ans=0.125 2024-08-12 20:45:34,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-12 20:45:45,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1819380.0, ans=0.2 2024-08-12 20:45:47,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1819380.0, ans=0.125 2024-08-12 20:45:49,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1819480.0, ans=0.125 2024-08-12 20:45:51,995 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 20:46:04,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8050, loss[loss=0.07407, beats_loss=0.01411, ecapa_loss=0.0001464, whisper_loss=0.0585, over 19311.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01094, ecapa_loss=0.00017, whisper_loss=0.09185, over 3897120.99 frames. ], batch size: 79, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:46:10,207 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 20:46:13,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1819580.0, ans=0.125 2024-08-12 20:46:14,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1819580.0, ans=0.0 2024-08-12 20:46:15,725 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 20:46:24,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1819680.0, ans=0.125 2024-08-12 20:46:43,338 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 34 from Vox, 36 fro AS 2024-08-12 20:46:47,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1819880.0, ans=0.0 2024-08-12 20:46:48,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1819880.0, ans=0.125 2024-08-12 20:47:13,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8100, loss[loss=0.1116, beats_loss=0.009052, ecapa_loss=0.0001971, whisper_loss=0.1006, over 16479.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01106, ecapa_loss=0.0001706, whisper_loss=0.09108, over 3879342.52 frames. ], batch size: 63, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:47:14,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2024-08-12 20:47:27,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1820180.0, ans=0.025 2024-08-12 20:47:33,812 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 20:47:40,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.501e+01 2.882e+01 3.230e+01 4.763e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 20:47:40,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1820280.0, ans=0.0 2024-08-12 20:47:45,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1820280.0, ans=0.0 2024-08-12 20:47:48,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1820280.0, ans=0.125 2024-08-12 20:47:56,144 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 9 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 20:47:58,872 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 20:48:03,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-12 20:48:15,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-08-12 20:48:18,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1820480.0, ans=0.1 2024-08-12 20:48:22,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8150, loss[loss=0.08886, beats_loss=0.01254, ecapa_loss=0.0001296, whisper_loss=0.07502, over 15497.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001723, whisper_loss=0.09146, over 3884918.98 frames. ], batch size: 60, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:48:45,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1820680.0, ans=0.125 2024-08-12 20:48:55,794 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-12 20:49:01,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-12 20:49:18,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1820980.0, ans=0.125 2024-08-12 20:49:30,330 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 20:49:31,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8200, loss[loss=0.1178, beats_loss=0.01038, ecapa_loss=0.0001856, whisper_loss=0.1055, over 15387.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01099, ecapa_loss=0.0001732, whisper_loss=0.09168, over 3900288.31 frames. ], batch size: 63, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:49:36,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1821080.0, ans=0.125 2024-08-12 20:49:39,038 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-12 20:49:44,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1821180.0, ans=0.125 2024-08-12 20:49:51,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1821180.0, ans=0.1 2024-08-12 20:49:53,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1821180.0, ans=0.125 2024-08-12 20:49:53,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1821180.0, ans=0.04949747468305833 2024-08-12 20:49:59,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.516e+01 2.770e+01 3.136e+01 5.305e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 20:50:00,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-12 20:50:02,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1821280.0, ans=0.125 2024-08-12 20:50:08,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=12.0 2024-08-12 20:50:15,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1821380.0, ans=0.0 2024-08-12 20:50:35,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1821480.0, ans=0.125 2024-08-12 20:50:40,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8250, loss[loss=0.1147, beats_loss=0.0093, ecapa_loss=0.0001785, whisper_loss=0.1036, over 17027.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001731, whisper_loss=0.09171, over 3894980.99 frames. ], batch size: 64, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:50:42,183 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 20:50:47,882 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 34 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 20:50:49,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2024-08-12 20:50:54,774 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 20:50:54,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1821680.0, ans=0.0 2024-08-12 20:51:03,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1821680.0, ans=0.125 2024-08-12 20:51:13,165 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 12 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 20:51:16,151 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 20:51:19,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1821780.0, ans=0.0 2024-08-12 20:51:23,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1821880.0, ans=0.125 2024-08-12 20:51:28,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2024-08-12 20:51:35,493 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 20:51:35,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1821980.0, ans=0.125 2024-08-12 20:51:35,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1821980.0, ans=0.05 2024-08-12 20:51:37,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2024-08-12 20:51:49,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1822080.0, ans=0.125 2024-08-12 20:51:50,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8300, loss[loss=0.1256, beats_loss=0.009208, ecapa_loss=0.0002163, whisper_loss=0.1142, over 22022.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001707, whisper_loss=0.09158, over 3890351.32 frames. ], batch size: 94, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:51:50,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1822080.0, ans=0.0 2024-08-12 20:51:52,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.39 vs. limit=10.0 2024-08-12 20:51:53,139 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 20:52:14,881 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 20:52:17,635 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.463e+01 2.692e+01 3.120e+01 9.968e+01, threshold=5.383e+01, percent-clipped=3.0 2024-08-12 20:52:23,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1822280.0, ans=0.0 2024-08-12 20:52:25,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1822280.0, ans=0.0 2024-08-12 20:52:38,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1822380.0, ans=0.0 2024-08-12 20:52:48,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1822480.0, ans=0.125 2024-08-12 20:52:51,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1822480.0, ans=0.125 2024-08-12 20:52:58,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8350, loss[loss=0.117, beats_loss=0.009734, ecapa_loss=0.0001542, whisper_loss=0.1057, over 19864.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001703, whisper_loss=0.09198, over 3906288.75 frames. ], batch size: 73, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:53:01,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-12 20:53:02,161 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 20:53:15,371 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 19 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-12 20:53:22,453 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 20:53:28,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1822780.0, ans=0.125 2024-08-12 20:53:31,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-12 20:53:32,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1822780.0, ans=0.0 2024-08-12 20:53:46,921 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 20:53:58,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-12 20:54:07,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8400, loss[loss=0.1024, beats_loss=0.0116, ecapa_loss=0.0001744, whisper_loss=0.08904, over 22255.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.0001702, whisper_loss=0.09161, over 3921335.69 frames. ], batch size: 92, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:54:09,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1823080.0, ans=0.0 2024-08-12 20:54:29,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1823180.0, ans=0.125 2024-08-12 20:54:31,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-12 20:54:32,036 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 20:54:35,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.526e+01 2.875e+01 3.220e+01 4.758e+01, threshold=5.750e+01, percent-clipped=0.0 2024-08-12 20:54:50,843 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 20:54:59,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1823380.0, ans=0.2 2024-08-12 20:55:02,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1823380.0, ans=0.125 2024-08-12 20:55:04,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1823480.0, ans=0.125 2024-08-12 20:55:13,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2024-08-12 20:55:14,059 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:55:18,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8450, loss[loss=0.1048, beats_loss=0.01259, ecapa_loss=0.0001306, whisper_loss=0.09088, over 15045.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001702, whisper_loss=0.09181, over 3911364.43 frames. ], batch size: 57, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:55:19,162 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 20:55:22,169 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 20:55:26,903 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 27 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 20:55:42,109 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 20:55:44,802 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 20:56:07,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-12 20:56:23,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1823980.0, ans=6.0 2024-08-12 20:56:26,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1823980.0, ans=0.0 2024-08-12 20:56:31,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8500, loss[loss=0.1029, beats_loss=0.01172, ecapa_loss=0.0001568, whisper_loss=0.08965, over 21667.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01094, ecapa_loss=0.0001701, whisper_loss=0.09231, over 3917399.97 frames. ], batch size: 88, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:56:58,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=15.0 2024-08-12 20:57:01,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.569e+01 2.792e+01 3.196e+01 4.300e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-12 20:57:04,998 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 20:57:06,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1824280.0, ans=0.0 2024-08-12 20:57:12,257 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 20:57:13,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1824280.0, ans=0.125 2024-08-12 20:57:14,724 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 20:57:28,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1824380.0, ans=0.1 2024-08-12 20:57:38,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1824480.0, ans=0.125 2024-08-12 20:57:41,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1824480.0, ans=0.07 2024-08-12 20:57:46,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8550, loss[loss=0.129, beats_loss=0.008748, ecapa_loss=0.0001656, whisper_loss=0.1186, over 24094.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01091, ecapa_loss=0.0001705, whisper_loss=0.09226, over 3872130.20 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:57:52,144 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-12 20:58:09,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1824680.0, ans=0.0 2024-08-12 20:58:11,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1824680.0, ans=0.1 2024-08-12 20:58:12,354 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 19 from LS+wenet, 27 from Vox, 49 fro AS 2024-08-12 20:58:30,843 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 20:58:39,689 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 20:58:58,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8600, loss[loss=0.1117, beats_loss=0.01071, ecapa_loss=0.000218, whisper_loss=0.09879, over 21649.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001719, whisper_loss=0.09232, over 3860112.50 frames. ], batch size: 93, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:59:05,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1825080.0, ans=0.04949747468305833 2024-08-12 20:59:06,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-12 20:59:10,814 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:59:19,841 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 20:59:21,397 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 20:59:31,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.497e+01 2.777e+01 3.095e+01 5.281e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 20:59:33,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2024-08-12 20:59:37,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-12 20:59:46,966 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:59:50,312 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 21:00:06,669 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:00:17,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8650, loss[loss=0.108, beats_loss=0.01095, ecapa_loss=0.0001663, whisper_loss=0.09538, over 16536.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0001721, whisper_loss=0.09113, over 3859559.08 frames. ], batch size: 65, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:00:27,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1825580.0, ans=0.125 2024-08-12 21:00:42,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1825680.0, ans=0.125 2024-08-12 21:00:55,574 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 21:01:03,134 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 21:01:03,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1825880.0, ans=0.04949747468305833 2024-08-12 21:01:09,540 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 21:01:09,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1825880.0, ans=0.125 2024-08-12 21:01:15,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1825880.0, ans=0.95 2024-08-12 21:01:21,504 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 21:01:27,355 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 32 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 21:01:33,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8700, loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.000182, whisper_loss=0.08971, over 18103.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001722, whisper_loss=0.0914, over 3850233.37 frames. ], batch size: 73, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:01:35,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1826080.0, ans=0.125 2024-08-12 21:01:40,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1826080.0, ans=0.125 2024-08-12 21:01:46,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1826080.0, ans=0.0 2024-08-12 21:01:51,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2024-08-12 21:01:59,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-12 21:02:04,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.617e+01 2.806e+01 3.109e+01 1.024e+02, threshold=5.612e+01, percent-clipped=1.0 2024-08-12 21:02:07,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1826280.0, ans=0.125 2024-08-12 21:02:13,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-12 21:02:41,180 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-12 21:02:41,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1826480.0, ans=0.1 2024-08-12 21:02:43,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1826480.0, ans=0.0 2024-08-12 21:02:50,040 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8750, loss[loss=0.1251, beats_loss=0.01046, ecapa_loss=0.0001597, whisper_loss=0.113, over 23447.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01088, ecapa_loss=0.0001728, whisper_loss=0.09205, over 3873809.59 frames. ], batch size: 89, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:03:26,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1826780.0, ans=0.0 2024-08-12 21:03:27,766 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 21:03:37,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.61 vs. limit=15.0 2024-08-12 21:04:03,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1826980.0, ans=0.0 2024-08-12 21:04:08,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8800, loss[loss=0.1008, beats_loss=0.01262, ecapa_loss=0.0001189, whisper_loss=0.08701, over 23396.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0001731, whisper_loss=0.09116, over 3871017.88 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:04:17,640 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 21:04:27,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1827180.0, ans=0.1 2024-08-12 21:04:39,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.528e+01 2.804e+01 3.159e+01 1.036e+02, threshold=5.609e+01, percent-clipped=2.0 2024-08-12 21:04:42,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1827280.0, ans=0.2 2024-08-12 21:04:44,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-12 21:04:46,233 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 21:04:57,440 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 21:05:04,921 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 21:05:16,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1827480.0, ans=0.125 2024-08-12 21:05:21,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1827480.0, ans=0.125 2024-08-12 21:05:21,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1827480.0, ans=0.0 2024-08-12 21:05:26,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8850, loss[loss=0.08355, beats_loss=0.01172, ecapa_loss=0.0001496, whisper_loss=0.07034, over 16161.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01107, ecapa_loss=0.0001725, whisper_loss=0.09055, over 3850757.54 frames. ], batch size: 65, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:05:27,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-12 21:05:36,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1827580.0, ans=0.125 2024-08-12 21:05:44,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1827680.0, ans=0.1 2024-08-12 21:05:47,185 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 21:05:58,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-12 21:06:10,918 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 21:06:16,932 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 21:06:19,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1827880.0, ans=10.0 2024-08-12 21:06:19,955 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 21:06:28,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1827980.0, ans=0.125 2024-08-12 21:06:33,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1827980.0, ans=0.125 2024-08-12 21:06:42,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8900, loss[loss=0.1085, beats_loss=0.01042, ecapa_loss=0.0001707, whisper_loss=0.09634, over 18822.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01109, ecapa_loss=0.0001715, whisper_loss=0.09054, over 3852272.92 frames. ], batch size: 73, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:07:03,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1828180.0, ans=0.125 2024-08-12 21:07:15,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=12.0 2024-08-12 21:07:15,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.512e+01 2.855e+01 3.103e+01 6.109e+01, threshold=5.710e+01, percent-clipped=1.0 2024-08-12 21:07:22,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1828280.0, ans=0.125 2024-08-12 21:07:50,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1828480.0, ans=0.5 2024-08-12 21:07:53,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2024-08-12 21:07:54,250 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 21:07:59,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 8950, loss[loss=0.06303, beats_loss=0.01376, ecapa_loss=0.0001634, whisper_loss=0.04764, over 17060.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01109, ecapa_loss=0.0001715, whisper_loss=0.0907, over 3875238.49 frames. ], batch size: 69, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:08:03,723 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 21:08:08,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2024-08-12 21:08:09,240 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 21:08:25,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1828680.0, ans=0.125 2024-08-12 21:08:29,139 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 21:08:37,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1828780.0, ans=0.0 2024-08-12 21:08:51,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1828880.0, ans=0.1 2024-08-12 21:08:59,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-12 21:09:12,381 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 21:09:16,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9000, loss[loss=0.084, beats_loss=0.01202, ecapa_loss=0.0001647, whisper_loss=0.07033, over 14913.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001735, whisper_loss=0.09112, over 3879817.18 frames. ], batch size: 60, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:09:16,144 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 21:09:54,945 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005776, whisper_loss=0.2483, over 922467.00 frames. 2024-08-12 21:10:13,730 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on SV_voxceleb1: loss=0.004711, beats_loss=0, ecapa_loss=0.0004711, whisper_loss=0, over 939242.00 frames. 2024-08-12 21:12:02,788 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 21:12:02,792 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 21:12:08,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1829080.0, ans=0.1 2024-08-12 21:12:10,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1829080.0, ans=0.2 2024-08-12 21:12:37,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.414e+01 2.685e+01 3.059e+01 6.063e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-12 21:12:54,707 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 21:13:14,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1829480.0, ans=0.0 2024-08-12 21:13:15,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1829480.0, ans=0.125 2024-08-12 21:13:20,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1829480.0, ans=0.125 2024-08-12 21:13:22,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9050, loss[loss=0.09534, beats_loss=0.0102, ecapa_loss=0.000211, whisper_loss=0.08302, over 20376.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01097, ecapa_loss=0.000174, whisper_loss=0.09083, over 3852432.59 frames. ], batch size: 88, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:13:24,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1829580.0, ans=0.0 2024-08-12 21:14:06,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1829780.0, ans=0.2 2024-08-12 21:14:06,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-12 21:14:13,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1829880.0, ans=0.0 2024-08-12 21:14:14,780 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 21:14:16,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1829880.0, ans=0.1 2024-08-12 21:14:21,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1829880.0, ans=0.0 2024-08-12 21:14:26,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1829980.0, ans=0.0 2024-08-12 21:14:38,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9100, loss[loss=0.1219, beats_loss=0.009668, ecapa_loss=0.0001757, whisper_loss=0.1105, over 22741.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001737, whisper_loss=0.0914, over 3881688.93 frames. ], batch size: 90, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:14:42,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1830080.0, ans=0.2 2024-08-12 21:15:05,808 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 21:15:06,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1830180.0, ans=0.1 2024-08-12 21:15:11,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.477e+01 2.788e+01 3.055e+01 6.197e+01, threshold=5.576e+01, percent-clipped=1.0 2024-08-12 21:15:17,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-12 21:15:18,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1830280.0, ans=0.2 2024-08-12 21:15:29,997 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 21:15:49,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-08-12 21:15:51,527 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 21:15:56,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9150, loss[loss=0.1202, beats_loss=0.008905, ecapa_loss=0.0001622, whisper_loss=0.1097, over 16800.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001734, whisper_loss=0.09181, over 3911276.14 frames. ], batch size: 61, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:15:57,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2024-08-12 21:16:07,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1830580.0, ans=0.125 2024-08-12 21:16:13,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1830680.0, ans=0.0 2024-08-12 21:16:43,241 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 21:16:59,536 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 21:17:02,217 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 21:17:10,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9200, loss[loss=0.09642, beats_loss=0.009937, ecapa_loss=0.0001729, whisper_loss=0.08475, over 13919.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.0001728, whisper_loss=0.09168, over 3901653.86 frames. ], batch size: 56, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:17:18,385 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-12 21:17:30,606 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 21:17:42,022 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.481e+01 2.738e+01 3.160e+01 4.519e+01, threshold=5.476e+01, percent-clipped=0.0 2024-08-12 21:18:07,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=12.0 2024-08-12 21:18:26,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9250, loss[loss=0.1083, beats_loss=0.01026, ecapa_loss=0.0001809, whisper_loss=0.09627, over 22128.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01097, ecapa_loss=0.0001715, whisper_loss=0.09115, over 3888989.09 frames. ], batch size: 92, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:18:28,486 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 21:18:39,475 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.796e-01 2024-08-12 21:18:43,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1831680.0, ans=0.0 2024-08-12 21:18:44,559 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 21:18:55,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1831780.0, ans=0.0 2024-08-12 21:18:59,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=8.0 2024-08-12 21:19:22,189 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 21:19:37,209 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 21:19:41,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9300, loss[loss=0.1236, beats_loss=0.01015, ecapa_loss=0.0001514, whisper_loss=0.112, over 21472.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001713, whisper_loss=0.09128, over 3870868.40 frames. ], batch size: 82, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:19:45,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1832080.0, ans=0.1 2024-08-12 21:19:47,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-08-12 21:20:02,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1832180.0, ans=0.125 2024-08-12 21:20:07,967 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 21:20:11,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.613e+01 2.997e+01 3.337e+01 4.853e+01, threshold=5.993e+01, percent-clipped=0.0 2024-08-12 21:20:16,652 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-12 21:20:25,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-12 21:20:26,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1832380.0, ans=0.125 2024-08-12 21:20:45,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1832480.0, ans=0.07 2024-08-12 21:20:54,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9350, loss[loss=0.1061, beats_loss=0.01354, ecapa_loss=0.0001074, whisper_loss=0.09148, over 17587.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01095, ecapa_loss=0.000172, whisper_loss=0.09117, over 3866148.64 frames. ], batch size: 65, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:21:02,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1832580.0, ans=0.1 2024-08-12 21:21:05,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1832580.0, ans=0.125 2024-08-12 21:21:05,447 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.584e-01 2024-08-12 21:21:24,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1832780.0, ans=0.0 2024-08-12 21:21:24,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1832780.0, ans=0.125 2024-08-12 21:21:25,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1832780.0, ans=0.0 2024-08-12 21:21:38,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1832880.0, ans=0.125 2024-08-12 21:21:42,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2024-08-12 21:21:44,829 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 21:21:52,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-12 21:22:05,499 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 21:22:08,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9400, loss[loss=0.1035, beats_loss=0.01278, ecapa_loss=0.0001549, whisper_loss=0.08918, over 18986.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.0001709, whisper_loss=0.091, over 3875678.65 frames. ], batch size: 76, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:22:13,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1833080.0, ans=0.125 2024-08-12 21:22:13,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-08-12 21:22:40,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1833280.0, ans=0.0 2024-08-12 21:22:40,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.361e+01 2.679e+01 2.977e+01 4.432e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-12 21:22:51,639 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 21:23:08,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1833480.0, ans=0.125 2024-08-12 21:23:24,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9450, loss[loss=0.08222, beats_loss=0.009451, ecapa_loss=0.0002136, whisper_loss=0.07063, over 14618.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01094, ecapa_loss=0.0001717, whisper_loss=0.09019, over 3836444.68 frames. ], batch size: 59, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:23:28,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-08-12 21:23:36,883 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 21:23:41,639 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 21:23:57,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1833780.0, ans=0.125 2024-08-12 21:24:05,249 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 21:24:08,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=22.5 2024-08-12 21:24:09,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2024-08-12 21:24:16,739 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 21:24:21,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-12 21:24:36,657 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 21:24:39,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9500, loss[loss=0.12, beats_loss=0.01145, ecapa_loss=0.0001354, whisper_loss=0.1072, over 23991.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01092, ecapa_loss=0.0001718, whisper_loss=0.09031, over 3846735.97 frames. ], batch size: 93, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:24:47,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.85 vs. limit=12.0 2024-08-12 21:25:01,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1834180.0, ans=0.0 2024-08-12 21:25:09,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.426e+01 2.699e+01 3.219e+01 5.763e+01, threshold=5.398e+01, percent-clipped=1.0 2024-08-12 21:25:26,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1834380.0, ans=0.125 2024-08-12 21:25:36,578 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 21:25:36,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1834480.0, ans=0.2 2024-08-12 21:25:46,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1834480.0, ans=0.125 2024-08-12 21:25:47,508 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 21:25:50,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9550, loss[loss=0.1105, beats_loss=0.008055, ecapa_loss=0.0001995, whisper_loss=0.1005, over 18681.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01098, ecapa_loss=0.0001717, whisper_loss=0.0901, over 3871019.54 frames. ], batch size: 78, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:25:50,701 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 21:26:01,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1834580.0, ans=10.0 2024-08-12 21:26:26,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1834780.0, ans=0.125 2024-08-12 21:26:46,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1834980.0, ans=0.125 2024-08-12 21:26:48,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1834980.0, ans=0.125 2024-08-12 21:27:01,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9600, loss[loss=0.1195, beats_loss=0.009928, ecapa_loss=0.0001576, whisper_loss=0.108, over 22788.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.000173, whisper_loss=0.09106, over 3843261.44 frames. ], batch size: 89, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:27:02,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1835080.0, ans=0.09899494936611666 2024-08-12 21:27:05,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1835080.0, ans=0.1 2024-08-12 21:27:05,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1835080.0, ans=0.125 2024-08-12 21:27:07,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-12 21:27:17,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1835180.0, ans=0.125 2024-08-12 21:27:20,856 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 21:27:30,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.578e+01 2.916e+01 3.452e+01 6.223e+01, threshold=5.833e+01, percent-clipped=1.0 2024-08-12 21:27:58,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1835480.0, ans=0.125 2024-08-12 21:28:01,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1835480.0, ans=0.125 2024-08-12 21:28:07,793 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 21:28:10,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9650, loss[loss=0.1304, beats_loss=0.008964, ecapa_loss=0.0001796, whisper_loss=0.1196, over 24073.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001749, whisper_loss=0.09061, over 3804674.59 frames. ], batch size: 92, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:28:13,268 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 21:28:41,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1835780.0, ans=0.125 2024-08-12 21:28:50,664 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 21:29:03,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1835880.0, ans=0.0 2024-08-12 21:29:19,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9700, loss[loss=0.1009, beats_loss=0.01227, ecapa_loss=0.0001923, whisper_loss=0.08666, over 20413.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01092, ecapa_loss=0.0001739, whisper_loss=0.0909, over 3822281.97 frames. ], batch size: 84, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:29:25,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1836080.0, ans=0.2 2024-08-12 21:29:27,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-12 21:29:35,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1836180.0, ans=0.125 2024-08-12 21:29:41,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1836180.0, ans=0.2 2024-08-12 21:29:48,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.429e+01 2.686e+01 3.028e+01 5.758e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-12 21:29:59,178 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-12 21:30:03,233 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 21:30:13,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1836380.0, ans=0.125 2024-08-12 21:30:26,822 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.357e+01 2024-08-12 21:30:30,599 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9750, loss[loss=0.0869, beats_loss=0.01142, ecapa_loss=0.0001766, whisper_loss=0.07371, over 17339.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01096, ecapa_loss=0.0001742, whisper_loss=0.09064, over 3822716.10 frames. ], batch size: 73, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:30:47,303 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 21:30:53,088 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 21:31:03,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1836780.0, ans=0.0 2024-08-12 21:31:04,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1836780.0, ans=0.125 2024-08-12 21:31:11,101 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 21:31:20,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1836880.0, ans=0.125 2024-08-12 21:31:23,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1836880.0, ans=0.025 2024-08-12 21:31:30,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1836980.0, ans=0.125 2024-08-12 21:31:42,706 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9800, loss[loss=0.1243, beats_loss=0.00756, ecapa_loss=0.0002255, whisper_loss=0.1145, over 17692.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.011, ecapa_loss=0.0001724, whisper_loss=0.09096, over 3858861.80 frames. ], batch size: 70, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:31:48,692 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 21:32:02,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1837180.0, ans=0.2 2024-08-12 21:32:07,076 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 21:32:11,239 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 21:32:12,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.082e+01 2.453e+01 2.781e+01 3.151e+01 8.550e+01, threshold=5.562e+01, percent-clipped=1.0 2024-08-12 21:32:13,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1837280.0, ans=0.0 2024-08-12 21:32:37,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-12 21:32:42,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1837480.0, ans=0.07 2024-08-12 21:32:55,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9850, loss[loss=0.1091, beats_loss=0.01019, ecapa_loss=0.0001656, whisper_loss=0.09728, over 16824.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01101, ecapa_loss=0.0001726, whisper_loss=0.09132, over 3843307.72 frames. ], batch size: 68, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:33:30,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=12.0 2024-08-12 21:33:32,290 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 21:33:39,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1837880.0, ans=0.125 2024-08-12 21:33:45,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1837880.0, ans=0.0 2024-08-12 21:34:00,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1837980.0, ans=0.125 2024-08-12 21:34:04,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1837980.0, ans=0.125 2024-08-12 21:34:06,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1838080.0, ans=0.1 2024-08-12 21:34:06,982 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9900, loss[loss=0.09472, beats_loss=0.01005, ecapa_loss=0.0002032, whisper_loss=0.08264, over 20525.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01101, ecapa_loss=0.0001732, whisper_loss=0.09084, over 3849142.67 frames. ], batch size: 86, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:34:09,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-12 21:34:15,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1838080.0, ans=0.2 2024-08-12 21:34:20,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-08-12 21:34:23,066 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 21:34:27,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1838180.0, ans=0.1 2024-08-12 21:34:34,613 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 21:34:36,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.567e+01 2.799e+01 3.140e+01 5.231e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 21:34:43,740 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 21:34:49,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1838380.0, ans=0.125 2024-08-12 21:34:52,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1838380.0, ans=0.0 2024-08-12 21:35:07,602 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 21:35:12,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1838480.0, ans=0.2 2024-08-12 21:35:17,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-12 21:35:19,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1838480.0, ans=0.125 2024-08-12 21:35:21,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 9950, loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001555, whisper_loss=0.08961, over 21041.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01102, ecapa_loss=0.0001721, whisper_loss=0.09078, over 3856356.26 frames. ], batch size: 82, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:35:22,052 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-12 21:35:23,640 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 21:36:03,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1838780.0, ans=0.0 2024-08-12 21:36:18,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1838880.0, ans=0.0 2024-08-12 21:36:19,148 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 21:36:29,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1838980.0, ans=0.0 2024-08-12 21:36:36,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10000, loss[loss=0.08442, beats_loss=0.01453, ecapa_loss=0.0001337, whisper_loss=0.06855, over 18932.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.0001726, whisper_loss=0.09118, over 3844139.36 frames. ], batch size: 78, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:36:38,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=12.0 2024-08-12 21:36:58,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=15.0 2024-08-12 21:37:02,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1839180.0, ans=0.125 2024-08-12 21:37:06,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.540e+01 2.812e+01 3.144e+01 2.734e+02, threshold=5.624e+01, percent-clipped=2.0 2024-08-12 21:37:08,607 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 21:37:11,517 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 21:37:16,012 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 21:37:19,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1839380.0, ans=0.0 2024-08-12 21:37:48,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10050, loss[loss=0.1111, beats_loss=0.011, ecapa_loss=0.0001583, whisper_loss=0.09853, over 24052.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001728, whisper_loss=0.09121, over 3840048.15 frames. ], batch size: 94, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:37:49,194 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-12 21:38:02,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1839580.0, ans=0.125 2024-08-12 21:38:47,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1839880.0, ans=0.125 2024-08-12 21:38:50,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1839880.0, ans=0.2 2024-08-12 21:38:54,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.82 vs. limit=22.5 2024-08-12 21:39:02,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1839980.0, ans=0.125 2024-08-12 21:39:09,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-12 21:39:12,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10100, loss[loss=0.08862, beats_loss=0.01293, ecapa_loss=0.0001721, whisper_loss=0.07397, over 20991.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01095, ecapa_loss=0.0001723, whisper_loss=0.09087, over 3857453.10 frames. ], batch size: 88, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:39:24,346 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-12 21:39:38,099 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 21:39:45,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.534e+01 2.755e+01 3.172e+01 9.610e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-12 21:40:00,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1840380.0, ans=0.2 2024-08-12 21:40:34,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10150, loss[loss=0.08874, beats_loss=0.01042, ecapa_loss=0.0001672, whisper_loss=0.07665, over 15140.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01092, ecapa_loss=0.0001727, whisper_loss=0.09043, over 3842867.63 frames. ], batch size: 59, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:40:39,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1840580.0, ans=0.125 2024-08-12 21:40:41,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1840580.0, ans=0.0 2024-08-12 21:40:41,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1840580.0, ans=0.0 2024-08-12 21:40:42,614 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 21:40:51,303 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 21:40:51,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1840580.0, ans=0.125 2024-08-12 21:40:51,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-12 21:40:59,017 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 21:41:06,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1840680.0, ans=0.125 2024-08-12 21:41:23,183 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 21:41:27,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-12 21:41:37,787 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 21:41:54,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1840980.0, ans=0.125 2024-08-12 21:41:57,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1840980.0, ans=0.0 2024-08-12 21:41:59,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1840980.0, ans=0.125 2024-08-12 21:42:08,513 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10200, loss[loss=0.09304, beats_loss=0.01231, ecapa_loss=0.0001696, whisper_loss=0.07903, over 16848.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001741, whisper_loss=0.09093, over 3827038.50 frames. ], batch size: 66, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:42:35,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1841180.0, ans=0.2 2024-08-12 21:42:45,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1841180.0, ans=0.125 2024-08-12 21:42:53,947 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.450e+01 2.670e+01 3.042e+01 4.548e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-12 21:43:42,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1841480.0, ans=0.125 2024-08-12 21:43:57,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10250, loss[loss=0.1057, beats_loss=0.008134, ecapa_loss=0.0002067, whisper_loss=0.09545, over 18809.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001743, whisper_loss=0.09126, over 3835826.09 frames. ], batch size: 73, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:44:26,124 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 21:44:42,057 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-12 21:44:45,971 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 21:45:00,222 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 21:45:20,908 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 21:45:24,581 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 21:45:33,848 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 21:45:47,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10300, loss[loss=0.1012, beats_loss=0.01104, ecapa_loss=0.0001435, whisper_loss=0.08872, over 14387.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01087, ecapa_loss=0.0001725, whisper_loss=0.09129, over 3834404.49 frames. ], batch size: 54, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:46:01,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1842080.0, ans=0.1 2024-08-12 21:46:05,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1842080.0, ans=0.1 2024-08-12 21:46:25,312 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 21:46:37,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.506e+01 2.751e+01 3.160e+01 4.441e+01, threshold=5.501e+01, percent-clipped=0.0 2024-08-12 21:46:58,414 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 21:47:05,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1842380.0, ans=0.125 2024-08-12 21:47:33,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10350, loss[loss=0.1091, beats_loss=0.00986, ecapa_loss=0.0001608, whisper_loss=0.09762, over 16165.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01083, ecapa_loss=0.0001717, whisper_loss=0.0921, over 3865003.32 frames. ], batch size: 62, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:47:34,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-12 21:48:20,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1842880.0, ans=0.09899494936611666 2024-08-12 21:48:45,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10400, loss[loss=0.08548, beats_loss=0.01427, ecapa_loss=0.0001504, whisper_loss=0.0697, over 22257.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01088, ecapa_loss=0.0001706, whisper_loss=0.09207, over 3887222.57 frames. ], batch size: 94, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:48:46,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1843080.0, ans=0.0 2024-08-12 21:48:46,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1843080.0, ans=0.0 2024-08-12 21:48:56,456 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-12 21:49:11,545 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 21:49:16,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.438e+01 2.753e+01 3.076e+01 5.598e+01, threshold=5.505e+01, percent-clipped=1.0 2024-08-12 21:49:17,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1843280.0, ans=0.125 2024-08-12 21:49:47,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1843480.0, ans=0.125 2024-08-12 21:49:59,588 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10450, loss[loss=0.09832, beats_loss=0.01277, ecapa_loss=0.0001582, whisper_loss=0.08398, over 20899.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001698, whisper_loss=0.09184, over 3884603.87 frames. ], batch size: 84, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:50:13,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1843580.0, ans=0.0 2024-08-12 21:50:33,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1843780.0, ans=0.125 2024-08-12 21:50:44,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1843880.0, ans=0.125 2024-08-12 21:50:48,846 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 21:50:57,535 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 21:51:14,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10500, loss[loss=0.09019, beats_loss=0.01459, ecapa_loss=0.0001454, whisper_loss=0.07415, over 22635.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001716, whisper_loss=0.09124, over 3856421.91 frames. ], batch size: 92, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:51:14,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1844080.0, ans=0.0 2024-08-12 21:51:22,092 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 21:51:34,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-12 21:51:35,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1844180.0, ans=0.125 2024-08-12 21:51:45,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.358e+01 2.688e+01 3.093e+01 1.105e+02, threshold=5.376e+01, percent-clipped=1.0 2024-08-12 21:51:50,737 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 21:51:52,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.5 2024-08-12 21:51:54,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=12.0 2024-08-12 21:52:15,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1844480.0, ans=0.07 2024-08-12 21:52:16,833 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 21:52:20,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1844480.0, ans=0.125 2024-08-12 21:52:30,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10550, loss[loss=0.1003, beats_loss=0.01104, ecapa_loss=0.0001906, whisper_loss=0.08734, over 13800.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01096, ecapa_loss=0.0001716, whisper_loss=0.09102, over 3874686.26 frames. ], batch size: 56, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:52:43,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1844580.0, ans=0.125 2024-08-12 21:52:50,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1844680.0, ans=0.0 2024-08-12 21:52:55,040 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.695e+00 2024-08-12 21:53:05,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-08-12 21:53:37,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2024-08-12 21:53:37,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-12 21:53:42,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1844980.0, ans=0.1 2024-08-12 21:53:44,455 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 21:53:44,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1844980.0, ans=0.125 2024-08-12 21:53:48,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10600, loss[loss=0.104, beats_loss=0.009398, ecapa_loss=0.0001614, whisper_loss=0.09296, over 24215.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01097, ecapa_loss=0.0001709, whisper_loss=0.09073, over 3882488.64 frames. ], batch size: 92, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:54:08,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1845180.0, ans=0.0 2024-08-12 21:54:15,432 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 21:54:15,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1845180.0, ans=0.125 2024-08-12 21:54:20,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1845280.0, ans=0.125 2024-08-12 21:54:21,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.510e+01 2.765e+01 3.245e+01 5.665e+01, threshold=5.530e+01, percent-clipped=1.0 2024-08-12 21:54:49,423 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 21:54:58,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1845480.0, ans=0.09899494936611666 2024-08-12 21:55:04,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10650, loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001706, whisper_loss=0.09165, over 16424.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01099, ecapa_loss=0.0001704, whisper_loss=0.09036, over 3868613.31 frames. ], batch size: 63, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:55:05,936 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 9 from Vox, 38 fro AS 2024-08-12 21:55:25,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1845680.0, ans=0.2 2024-08-12 21:55:50,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-12 21:55:53,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1845880.0, ans=0.125 2024-08-12 21:55:54,739 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-12 21:55:55,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1845880.0, ans=0.09899494936611666 2024-08-12 21:55:59,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1845880.0, ans=0.125 2024-08-12 21:56:09,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1845980.0, ans=0.125 2024-08-12 21:56:14,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1845980.0, ans=0.125 2024-08-12 21:56:23,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10700, loss[loss=0.1002, beats_loss=0.01217, ecapa_loss=0.0002155, whisper_loss=0.08587, over 19584.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001695, whisper_loss=0.09161, over 3886873.62 frames. ], batch size: 81, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:56:25,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1846080.0, ans=0.125 2024-08-12 21:56:27,873 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 21:56:42,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1846180.0, ans=0.125 2024-08-12 21:56:47,438 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-12 21:56:55,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.619e+01 2.989e+01 3.264e+01 5.454e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-12 21:56:58,717 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 21:57:01,498 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 21:57:14,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1846380.0, ans=0.125 2024-08-12 21:57:22,364 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 21:57:40,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10750, loss[loss=0.103, beats_loss=0.0112, ecapa_loss=0.0001562, whisper_loss=0.09023, over 18365.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01087, ecapa_loss=0.0001698, whisper_loss=0.09247, over 3906714.28 frames. ], batch size: 72, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:57:42,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1846580.0, ans=0.125 2024-08-12 21:58:03,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1846680.0, ans=0.0 2024-08-12 21:58:04,541 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 21:58:46,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1846980.0, ans=0.1 2024-08-12 21:58:53,652 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10800, loss[loss=0.1231, beats_loss=0.008796, ecapa_loss=0.0001946, whisper_loss=0.1124, over 20173.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01084, ecapa_loss=0.0001713, whisper_loss=0.09305, over 3899018.82 frames. ], batch size: 80, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:58:54,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.94 vs. limit=15.0 2024-08-12 21:59:10,571 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 21:59:12,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1847180.0, ans=0.2 2024-08-12 21:59:16,564 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 21:59:23,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.464e+01 2.831e+01 3.292e+01 5.711e+01, threshold=5.661e+01, percent-clipped=0.0 2024-08-12 21:59:30,734 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 21:59:35,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-12 21:59:42,175 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 21:59:51,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=8.0 2024-08-12 21:59:59,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1847480.0, ans=0.0 2024-08-12 22:00:05,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10850, loss[loss=0.09479, beats_loss=0.01116, ecapa_loss=0.0001982, whisper_loss=0.08164, over 20236.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01083, ecapa_loss=0.0001712, whisper_loss=0.09273, over 3929872.67 frames. ], batch size: 84, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:00:05,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-08-12 22:00:09,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1847580.0, ans=0.125 2024-08-12 22:00:44,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1847780.0, ans=0.0 2024-08-12 22:00:46,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1847780.0, ans=0.125 2024-08-12 22:00:52,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1847880.0, ans=0.0 2024-08-12 22:00:55,272 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 22:00:57,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-12 22:01:04,907 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 22:01:17,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10900, loss[loss=0.1164, beats_loss=0.009514, ecapa_loss=0.0001483, whisper_loss=0.1054, over 19899.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01092, ecapa_loss=0.0001698, whisper_loss=0.09252, over 3924365.36 frames. ], batch size: 75, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:01:17,488 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 22:01:25,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1848080.0, ans=0.2 2024-08-12 22:01:32,576 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 22:01:40,452 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 22:01:40,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1848180.0, ans=0.125 2024-08-12 22:01:48,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.539e+01 2.752e+01 3.152e+01 5.586e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 22:01:53,672 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 22:01:55,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1848280.0, ans=0.07 2024-08-12 22:02:10,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1848380.0, ans=15.0 2024-08-12 22:02:21,923 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 22:02:26,369 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 35 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 22:02:32,057 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 10950, loss[loss=0.1291, beats_loss=0.009632, ecapa_loss=0.0001898, whisper_loss=0.1176, over 18144.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001698, whisper_loss=0.09264, over 3923785.47 frames. ], batch size: 73, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:02:43,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1848580.0, ans=0.04949747468305833 2024-08-12 22:03:47,205 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11000, loss[loss=0.09749, beats_loss=0.01234, ecapa_loss=0.000145, whisper_loss=0.0837, over 23974.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.000171, whisper_loss=0.09265, over 3932377.35 frames. ], batch size: 93, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:03:51,209 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 22:03:52,512 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 22:03:52,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1849080.0, ans=0.125 2024-08-12 22:04:18,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.465e+01 2.797e+01 3.199e+01 6.867e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 22:04:25,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1849280.0, ans=0.0 2024-08-12 22:04:25,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1849280.0, ans=0.125 2024-08-12 22:04:36,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1849380.0, ans=0.05 2024-08-12 22:04:58,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11050, loss[loss=0.1036, beats_loss=0.009287, ecapa_loss=0.000163, whisper_loss=0.09273, over 22886.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01092, ecapa_loss=0.0001709, whisper_loss=0.09236, over 3918454.53 frames. ], batch size: 86, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:04:59,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1849580.0, ans=0.125 2024-08-12 22:05:02,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1849580.0, ans=0.125 2024-08-12 22:05:16,054 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 22:05:19,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1849680.0, ans=0.1 2024-08-12 22:05:25,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1849680.0, ans=0.125 2024-08-12 22:05:47,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1849880.0, ans=0.0 2024-08-12 22:05:51,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-12 22:05:53,650 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 22:05:55,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1849880.0, ans=0.1 2024-08-12 22:06:11,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1850080.0, ans=0.125 2024-08-12 22:06:11,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11100, loss[loss=0.1171, beats_loss=0.01158, ecapa_loss=0.0001104, whisper_loss=0.1044, over 19311.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.0001709, whisper_loss=0.0917, over 3903857.68 frames. ], batch size: 70, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:06:28,905 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 22:06:30,201 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 22:06:37,568 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 22:06:39,003 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 22:06:40,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1850280.0, ans=0.0 2024-08-12 22:06:44,657 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.458e+01 2.677e+01 3.068e+01 5.581e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-12 22:06:57,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=1850380.0, ans=22.5 2024-08-12 22:06:58,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1850380.0, ans=0.125 2024-08-12 22:06:59,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-12 22:07:02,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1850380.0, ans=0.1 2024-08-12 22:07:26,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11150, loss[loss=0.09551, beats_loss=0.0124, ecapa_loss=0.0001575, whisper_loss=0.08154, over 21691.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01089, ecapa_loss=0.0001708, whisper_loss=0.09171, over 3890452.18 frames. ], batch size: 89, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:07:36,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1850580.0, ans=0.07 2024-08-12 22:07:58,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1850780.0, ans=0.0 2024-08-12 22:08:02,629 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 22:08:12,275 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 22:08:21,023 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 22:08:24,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1850880.0, ans=0.125 2024-08-12 22:08:41,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-12 22:08:41,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11200, loss[loss=0.0899, beats_loss=0.01099, ecapa_loss=0.0002252, whisper_loss=0.07666, over 21822.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001709, whisper_loss=0.09176, over 3851470.75 frames. ], batch size: 94, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:08:41,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1851080.0, ans=0.1 2024-08-12 22:08:50,985 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 22:08:58,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-12 22:09:09,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1851180.0, ans=0.0 2024-08-12 22:09:14,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.512e+01 2.839e+01 3.173e+01 1.150e+02, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 22:09:18,533 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 35 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 22:09:18,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1851280.0, ans=0.125 2024-08-12 22:09:54,907 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 22:09:58,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2024-08-12 22:10:00,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11250, loss[loss=0.1116, beats_loss=0.01023, ecapa_loss=0.0001552, whisper_loss=0.09986, over 21720.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001724, whisper_loss=0.09208, over 3864075.20 frames. ], batch size: 83, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:10:14,551 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 22:10:34,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1851780.0, ans=0.125 2024-08-12 22:10:47,950 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 22:10:50,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1851880.0, ans=0.2 2024-08-12 22:10:56,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.29 vs. limit=22.5 2024-08-12 22:11:17,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1852080.0, ans=0.125 2024-08-12 22:11:18,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11300, loss[loss=0.1243, beats_loss=0.009437, ecapa_loss=0.0001584, whisper_loss=0.1133, over 22838.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.000173, whisper_loss=0.09192, over 3862346.33 frames. ], batch size: 89, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:11:37,339 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 22:11:39,013 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 22:11:49,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1852180.0, ans=0.125 2024-08-12 22:11:54,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1852280.0, ans=0.125 2024-08-12 22:11:55,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.542e+01 2.832e+01 3.166e+01 7.074e+01, threshold=5.665e+01, percent-clipped=1.0 2024-08-12 22:11:59,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1852280.0, ans=0.125 2024-08-12 22:12:03,874 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 22:12:04,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1852280.0, ans=0.125 2024-08-12 22:12:08,882 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 22:12:32,904 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 22:12:38,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1852480.0, ans=0.07 2024-08-12 22:12:40,706 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11350, loss[loss=0.1062, beats_loss=0.01013, ecapa_loss=0.0001743, whisper_loss=0.09432, over 17257.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001714, whisper_loss=0.09131, over 3854050.15 frames. ], batch size: 68, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:12:53,068 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 22:12:53,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1852580.0, ans=0.0 2024-08-12 22:12:55,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1852680.0, ans=0.0 2024-08-12 22:12:57,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1852680.0, ans=0.125 2024-08-12 22:13:14,291 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.542e+00 2024-08-12 22:13:15,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1852780.0, ans=0.125 2024-08-12 22:13:17,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1852780.0, ans=0.125 2024-08-12 22:13:48,902 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 22:14:02,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11400, loss[loss=0.1018, beats_loss=0.0111, ecapa_loss=0.0001905, whisper_loss=0.08884, over 15718.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01077, ecapa_loss=0.0001729, whisper_loss=0.09141, over 3865160.74 frames. ], batch size: 63, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:14:18,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1853180.0, ans=0.125 2024-08-12 22:14:20,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.73 vs. limit=22.5 2024-08-12 22:14:36,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.651e+01 3.000e+01 3.420e+01 5.421e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-12 22:14:37,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1853280.0, ans=0.125 2024-08-12 22:15:12,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-12 22:15:19,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11450, loss[loss=0.1247, beats_loss=0.009268, ecapa_loss=0.0001884, whisper_loss=0.1135, over 16948.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.0001738, whisper_loss=0.09131, over 3857527.18 frames. ], batch size: 65, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:15:38,381 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 22:15:38,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1853680.0, ans=0.125 2024-08-12 22:15:48,880 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 22:16:07,699 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 22:16:14,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-12 22:16:30,609 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 22:16:41,176 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11500, loss[loss=0.09283, beats_loss=0.01213, ecapa_loss=0.000144, whisper_loss=0.07926, over 21288.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01077, ecapa_loss=0.0001711, whisper_loss=0.09246, over 3902816.72 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:16:53,070 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 22:17:17,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.445e+01 2.643e+01 2.952e+01 4.086e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-12 22:17:17,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1854280.0, ans=0.0 2024-08-12 22:17:23,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=12.0 2024-08-12 22:17:46,524 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-12 22:17:46,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1854480.0, ans=0.125 2024-08-12 22:18:00,276 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 22:18:02,909 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 22:18:03,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11550, loss[loss=0.1079, beats_loss=0.008601, ecapa_loss=0.0001704, whisper_loss=0.09763, over 19483.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01082, ecapa_loss=0.0001713, whisper_loss=0.09221, over 3865556.83 frames. ], batch size: 74, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:18:04,434 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 22:18:10,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1854580.0, ans=0.1 2024-08-12 22:18:14,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1854580.0, ans=0.2 2024-08-12 22:18:15,925 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 22:18:16,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-08-12 22:18:22,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1854680.0, ans=0.125 2024-08-12 22:18:52,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1854880.0, ans=0.125 2024-08-12 22:19:04,339 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 22:19:24,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11600, loss[loss=0.1138, beats_loss=0.009604, ecapa_loss=0.0001639, whisper_loss=0.1025, over 24631.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.0001709, whisper_loss=0.09213, over 3871989.46 frames. ], batch size: 96, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:20:00,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.514e+01 2.737e+01 3.107e+01 4.746e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 22:20:02,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1855280.0, ans=0.0 2024-08-12 22:20:07,744 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 22:20:08,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1855280.0, ans=0.125 2024-08-12 22:20:18,239 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 22:20:30,180 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 22:20:34,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1855480.0, ans=0.125 2024-08-12 22:20:34,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1855480.0, ans=0.04949747468305833 2024-08-12 22:20:43,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11650, loss[loss=0.1225, beats_loss=0.009381, ecapa_loss=0.0001759, whisper_loss=0.1114, over 22152.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01089, ecapa_loss=0.0001717, whisper_loss=0.09195, over 3874534.21 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:20:44,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1855580.0, ans=0.1 2024-08-12 22:20:55,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1855580.0, ans=0.0 2024-08-12 22:21:01,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1855680.0, ans=0.1 2024-08-12 22:21:02,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1855680.0, ans=0.2 2024-08-12 22:21:25,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1855780.0, ans=0.5 2024-08-12 22:21:37,531 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 22:21:55,892 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 22:22:03,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11700, loss[loss=0.1358, beats_loss=0.009275, ecapa_loss=0.0001773, whisper_loss=0.1247, over 16643.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01098, ecapa_loss=0.0001697, whisper_loss=0.09196, over 3867405.68 frames. ], batch size: 64, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:22:12,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1856080.0, ans=0.125 2024-08-12 22:22:39,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.485e+01 2.712e+01 3.027e+01 7.497e+01, threshold=5.424e+01, percent-clipped=1.0 2024-08-12 22:23:06,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1856380.0, ans=0.0 2024-08-12 22:23:07,946 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 22:23:16,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1856480.0, ans=0.0 2024-08-12 22:23:27,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11750, loss[loss=0.09752, beats_loss=0.01033, ecapa_loss=0.000162, whisper_loss=0.08556, over 17471.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01108, ecapa_loss=0.0001697, whisper_loss=0.09155, over 3894325.70 frames. ], batch size: 68, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:23:40,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1856580.0, ans=0.125 2024-08-12 22:23:41,794 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 22:23:42,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1856580.0, ans=0.125 2024-08-12 22:23:45,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1856680.0, ans=0.125 2024-08-12 22:23:49,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2024-08-12 22:23:57,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1856680.0, ans=0.125 2024-08-12 22:24:11,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1856780.0, ans=0.125 2024-08-12 22:24:35,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1856980.0, ans=0.0 2024-08-12 22:24:35,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2024-08-12 22:24:43,452 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 22:24:45,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-12 22:24:45,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11800, loss[loss=0.1101, beats_loss=0.01244, ecapa_loss=0.0001387, whisper_loss=0.09625, over 22500.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001694, whisper_loss=0.09171, over 3887497.95 frames. ], batch size: 88, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:24:46,595 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 22:24:48,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1857080.0, ans=0.07 2024-08-12 22:25:08,578 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 22:25:21,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.559e+01 2.833e+01 3.342e+01 5.764e+01, threshold=5.666e+01, percent-clipped=1.0 2024-08-12 22:25:29,492 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 22:25:29,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1857280.0, ans=0.0 2024-08-12 22:25:59,906 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 22:26:06,702 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11850, loss[loss=0.107, beats_loss=0.01125, ecapa_loss=0.0001856, whisper_loss=0.09389, over 22137.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01103, ecapa_loss=0.0001693, whisper_loss=0.09193, over 3893770.23 frames. ], batch size: 89, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:26:09,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-12 22:26:20,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1857580.0, ans=0.125 2024-08-12 22:26:40,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1857780.0, ans=0.125 2024-08-12 22:26:42,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1857780.0, ans=0.125 2024-08-12 22:26:43,983 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 22:27:21,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1858080.0, ans=0.125 2024-08-12 22:27:21,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-08-12 22:27:22,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11900, loss[loss=0.1002, beats_loss=0.01238, ecapa_loss=0.000159, whisper_loss=0.08626, over 21957.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01103, ecapa_loss=0.0001686, whisper_loss=0.09198, over 3925847.92 frames. ], batch size: 90, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:27:35,298 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.654e-02 2024-08-12 22:27:43,625 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 37 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 22:27:48,849 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 22:27:52,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.540e+01 2.783e+01 3.070e+01 4.680e+01, threshold=5.566e+01, percent-clipped=0.0 2024-08-12 22:27:54,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-12 22:28:00,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1858280.0, ans=0.1 2024-08-12 22:28:16,008 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 22:28:17,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1858480.0, ans=0.125 2024-08-12 22:28:24,225 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 25 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-12 22:28:31,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 11950, loss[loss=0.07804, beats_loss=0.01149, ecapa_loss=0.0002076, whisper_loss=0.06448, over 18577.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001711, whisper_loss=0.09248, over 3910318.27 frames. ], batch size: 80, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:28:39,015 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 22:28:39,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1858580.0, ans=0.125 2024-08-12 22:28:40,431 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 22:28:44,588 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 22:28:47,484 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 12 from Vox, 53 fro AS 2024-08-12 22:28:49,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1858680.0, ans=0.0 2024-08-12 22:29:09,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-12 22:29:20,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1858880.0, ans=0.05 2024-08-12 22:29:20,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1858880.0, ans=0.125 2024-08-12 22:29:35,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-12 22:29:37,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.88 vs. limit=10.0 2024-08-12 22:29:39,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12000, loss[loss=0.1034, beats_loss=0.01174, ecapa_loss=0.0002406, whisper_loss=0.08923, over 22116.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01095, ecapa_loss=0.0001713, whisper_loss=0.09157, over 3877633.69 frames. ], batch size: 94, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:29:39,564 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 22:30:19,846 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0005805, whisper_loss=0.2504, over 922467.00 frames. 2024-08-12 22:30:37,831 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on SV_voxceleb1: loss=0.004691, beats_loss=0, ecapa_loss=0.0004691, whisper_loss=0, over 939242.00 frames. 2024-08-12 22:32:33,561 INFO [train_multi_KD3.py:1149] (3/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 22:32:33,565 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 22:33:04,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.535e+01 2.857e+01 3.270e+01 5.667e+01, threshold=5.714e+01, percent-clipped=0.0 2024-08-12 22:33:13,118 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 22:33:13,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1859280.0, ans=0.125 2024-08-12 22:33:16,235 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 22:33:22,298 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 22:33:22,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1859380.0, ans=0.2 2024-08-12 22:33:36,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1859480.0, ans=0.125 2024-08-12 22:33:37,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.41 vs. limit=10.0 2024-08-12 22:33:40,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=12.0 2024-08-12 22:33:46,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12050, loss[loss=0.1007, beats_loss=0.009924, ecapa_loss=0.0001613, whisper_loss=0.08916, over 15506.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01101, ecapa_loss=0.0001712, whisper_loss=0.09139, over 3893694.33 frames. ], batch size: 60, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:33:54,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2024-08-12 22:34:07,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1859680.0, ans=0.04949747468305833 2024-08-12 22:34:21,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1859780.0, ans=0.2 2024-08-12 22:34:23,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1859780.0, ans=0.125 2024-08-12 22:34:29,276 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 22:34:46,648 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.283e+01 2024-08-12 22:34:58,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12100, loss[loss=0.1146, beats_loss=0.0103, ecapa_loss=0.0001691, whisper_loss=0.1026, over 19613.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01102, ecapa_loss=0.0001719, whisper_loss=0.0915, over 3895958.48 frames. ], batch size: 76, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:35:02,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1860080.0, ans=0.125 2024-08-12 22:35:13,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1860180.0, ans=0.125 2024-08-12 22:35:19,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1860180.0, ans=0.0 2024-08-12 22:35:21,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1860180.0, ans=0.125 2024-08-12 22:35:27,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.529e+01 2.799e+01 3.028e+01 6.026e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 22:35:32,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1860280.0, ans=0.125 2024-08-12 22:35:34,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1860280.0, ans=0.2 2024-08-12 22:35:36,880 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 22:35:42,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-12 22:36:08,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12150, loss[loss=0.07149, beats_loss=0.01111, ecapa_loss=0.000173, whisper_loss=0.05865, over 14379.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01103, ecapa_loss=0.0001711, whisper_loss=0.09177, over 3883777.89 frames. ], batch size: 58, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:36:10,965 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 22:36:23,403 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 22:36:25,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1860680.0, ans=0.2 2024-08-12 22:36:26,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1860680.0, ans=0.125 2024-08-12 22:36:58,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1860880.0, ans=0.125 2024-08-12 22:37:00,791 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 22:37:06,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1860980.0, ans=0.2 2024-08-12 22:37:12,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1860980.0, ans=0.2 2024-08-12 22:37:18,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12200, loss[loss=0.08188, beats_loss=0.00997, ecapa_loss=0.0001664, whisper_loss=0.07025, over 17430.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01102, ecapa_loss=0.0001721, whisper_loss=0.09126, over 3867125.81 frames. ], batch size: 70, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:37:38,807 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-12 22:37:49,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.525e+01 2.741e+01 3.168e+01 5.471e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-12 22:37:55,990 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 22:37:58,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1861280.0, ans=0.1 2024-08-12 22:38:03,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1861380.0, ans=0.125 2024-08-12 22:38:26,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2024-08-12 22:38:29,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12250, loss[loss=0.09008, beats_loss=0.01257, ecapa_loss=0.000166, whisper_loss=0.07586, over 21641.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001739, whisper_loss=0.09145, over 3892153.41 frames. ], batch size: 88, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:38:31,165 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 22:38:32,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1861580.0, ans=0.0 2024-08-12 22:38:35,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-12 22:38:37,212 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 22:38:45,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1861680.0, ans=0.125 2024-08-12 22:38:48,408 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 22:38:50,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1861680.0, ans=0.2 2024-08-12 22:38:59,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-12 22:39:13,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1861880.0, ans=0.0 2024-08-12 22:39:40,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12300, loss[loss=0.07798, beats_loss=0.01084, ecapa_loss=0.0002099, whisper_loss=0.06504, over 21796.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01095, ecapa_loss=0.0001747, whisper_loss=0.09082, over 3909298.90 frames. ], batch size: 92, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:39:46,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1862080.0, ans=0.0 2024-08-12 22:40:02,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1862180.0, ans=0.125 2024-08-12 22:40:05,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1862180.0, ans=0.0 2024-08-12 22:40:06,313 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.490e+01 2024-08-12 22:40:11,037 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.506e+01 2.717e+01 3.049e+01 5.234e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 22:40:18,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-12 22:40:31,448 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 22:40:47,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1862580.0, ans=0.125 2024-08-12 22:40:48,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12350, loss[loss=0.1099, beats_loss=0.01293, ecapa_loss=0.0001295, whisper_loss=0.09565, over 23119.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001736, whisper_loss=0.09146, over 3917754.94 frames. ], batch size: 91, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:40:50,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1862580.0, ans=0.07 2024-08-12 22:40:58,865 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 22:41:20,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1862780.0, ans=0.125 2024-08-12 22:41:22,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1862780.0, ans=0.09899494936611666 2024-08-12 22:41:36,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1862880.0, ans=0.125 2024-08-12 22:41:41,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-12 22:41:42,813 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 22:41:43,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1862880.0, ans=0.2 2024-08-12 22:41:47,312 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 22:41:56,730 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 22:41:59,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12400, loss[loss=0.1025, beats_loss=0.01218, ecapa_loss=0.000113, whisper_loss=0.08917, over 16782.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01088, ecapa_loss=0.0001727, whisper_loss=0.09163, over 3879912.09 frames. ], batch size: 63, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:42:29,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.619e+01 2.853e+01 3.347e+01 1.216e+02, threshold=5.705e+01, percent-clipped=2.0 2024-08-12 22:42:39,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1863380.0, ans=0.125 2024-08-12 22:42:42,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1863380.0, ans=0.0 2024-08-12 22:42:49,269 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 22:43:04,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1863480.0, ans=0.125 2024-08-12 22:43:08,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12450, loss[loss=0.09972, beats_loss=0.01117, ecapa_loss=0.0001635, whisper_loss=0.08692, over 17633.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001726, whisper_loss=0.09159, over 3869531.46 frames. ], batch size: 72, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:43:09,177 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 22:43:15,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1863580.0, ans=0.125 2024-08-12 22:43:19,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1863580.0, ans=0.0 2024-08-12 22:43:37,651 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 22:44:05,215 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 22:44:12,883 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 22:44:19,531 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12500, loss[loss=0.08721, beats_loss=0.01309, ecapa_loss=0.0001784, whisper_loss=0.07234, over 19932.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001704, whisper_loss=0.09168, over 3868026.23 frames. ], batch size: 83, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:44:26,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1864080.0, ans=0.125 2024-08-12 22:44:46,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.40 vs. limit=10.0 2024-08-12 22:44:48,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1864280.0, ans=0.1 2024-08-12 22:44:49,300 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.439e+01 2.730e+01 3.074e+01 7.978e+01, threshold=5.460e+01, percent-clipped=1.0 2024-08-12 22:45:07,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-12 22:45:13,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1864480.0, ans=0.125 2024-08-12 22:45:22,874 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 22:45:26,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12550, loss[loss=0.1149, beats_loss=0.01198, ecapa_loss=0.0001376, whisper_loss=0.1016, over 23260.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01092, ecapa_loss=0.0001698, whisper_loss=0.09244, over 3882812.72 frames. ], batch size: 90, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:45:28,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1864580.0, ans=0.0 2024-08-12 22:45:28,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1864580.0, ans=0.125 2024-08-12 22:45:52,713 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 22:46:11,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1864880.0, ans=0.125 2024-08-12 22:46:11,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2024-08-12 22:46:33,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12600, loss[loss=0.1134, beats_loss=0.01005, ecapa_loss=0.0001726, whisper_loss=0.1017, over 18434.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01091, ecapa_loss=0.0001706, whisper_loss=0.09281, over 3916525.77 frames. ], batch size: 74, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:46:35,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2024-08-12 22:46:38,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1865080.0, ans=0.2 2024-08-12 22:46:56,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-12 22:46:58,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1865180.0, ans=0.125 2024-08-12 22:47:03,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.534e+01 2.817e+01 3.269e+01 5.497e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 22:47:04,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2024-08-12 22:47:05,208 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 37 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-12 22:47:15,844 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-12 22:47:28,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-12 22:47:29,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1865480.0, ans=0.2 2024-08-12 22:47:42,014 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12650, loss[loss=0.1078, beats_loss=0.01174, ecapa_loss=0.0002134, whisper_loss=0.09396, over 21617.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01091, ecapa_loss=0.0001725, whisper_loss=0.09318, over 3926635.57 frames. ], batch size: 91, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:47:43,963 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 22:47:57,366 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 22:48:11,236 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 22:48:14,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1865780.0, ans=0.025 2024-08-12 22:48:26,067 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 22:48:33,855 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-12 22:48:50,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12700, loss[loss=0.1152, beats_loss=0.009107, ecapa_loss=0.0001469, whisper_loss=0.1047, over 17827.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01092, ecapa_loss=0.0001735, whisper_loss=0.09338, over 3921667.40 frames. ], batch size: 66, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:49:18,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866280.0, ans=0.1 2024-08-12 22:49:21,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.468e+01 2.692e+01 3.051e+01 4.394e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-12 22:49:26,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1866280.0, ans=0.125 2024-08-12 22:49:31,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1866380.0, ans=0.0 2024-08-12 22:49:58,290 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 22:49:59,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12750, loss[loss=0.1262, beats_loss=0.007487, ecapa_loss=0.0002002, whisper_loss=0.1167, over 16493.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.000172, whisper_loss=0.09293, over 3885107.02 frames. ], batch size: 64, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:49:59,830 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 22:50:00,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-12 22:50:04,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1866580.0, ans=0.0 2024-08-12 22:50:08,910 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 22:50:11,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1866680.0, ans=0.04949747468305833 2024-08-12 22:50:14,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1866680.0, ans=0.0 2024-08-12 22:50:16,469 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 22:50:17,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2024-08-12 22:50:31,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1866780.0, ans=0.05 2024-08-12 22:50:35,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1866780.0, ans=0.0 2024-08-12 22:50:42,929 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 22:51:00,388 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 22:51:02,004 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-12 22:51:05,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12800, loss[loss=0.1105, beats_loss=0.009386, ecapa_loss=0.0001699, whisper_loss=0.09946, over 22534.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.0001726, whisper_loss=0.09239, over 3890787.14 frames. ], batch size: 86, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:51:11,156 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 22:51:23,711 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 22:51:24,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1867180.0, ans=0.2 2024-08-12 22:51:35,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.415e+01 2.675e+01 2.893e+01 6.675e+01, threshold=5.350e+01, percent-clipped=1.0 2024-08-12 22:51:47,666 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 22:51:57,335 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 22:52:04,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1867480.0, ans=0.125 2024-08-12 22:52:13,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12850, loss[loss=0.1082, beats_loss=0.01128, ecapa_loss=0.0001489, whisper_loss=0.09548, over 22743.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001721, whisper_loss=0.09171, over 3866983.72 frames. ], batch size: 89, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:52:25,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1867680.0, ans=0.0 2024-08-12 22:52:25,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1867680.0, ans=0.125 2024-08-12 22:52:32,105 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 28 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 22:52:33,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1867680.0, ans=10.0 2024-08-12 22:52:48,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1867780.0, ans=0.125 2024-08-12 22:52:49,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1867780.0, ans=0.125 2024-08-12 22:53:22,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1868080.0, ans=0.2 2024-08-12 22:53:23,058 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12900, loss[loss=0.1027, beats_loss=0.008312, ecapa_loss=0.0001701, whisper_loss=0.09264, over 18375.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.0001708, whisper_loss=0.09177, over 3852987.25 frames. ], batch size: 69, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:53:29,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1868080.0, ans=0.0 2024-08-12 22:53:31,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1868080.0, ans=0.1 2024-08-12 22:53:33,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=12.0 2024-08-12 22:53:41,379 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 22:53:52,273 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-12 22:53:53,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.434e+01 2.743e+01 3.168e+01 4.693e+01, threshold=5.486e+01, percent-clipped=0.0 2024-08-12 22:54:13,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2024-08-12 22:54:32,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 12950, loss[loss=0.1139, beats_loss=0.01029, ecapa_loss=0.0001298, whisper_loss=0.1023, over 19320.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.000171, whisper_loss=0.092, over 3824655.04 frames. ], batch size: 71, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:54:35,668 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 22:54:37,037 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 22:54:42,373 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 22:54:49,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1868680.0, ans=0.1 2024-08-12 22:54:56,979 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 22:55:02,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1868780.0, ans=0.125 2024-08-12 22:55:07,704 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 22:55:13,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1868880.0, ans=0.04949747468305833 2024-08-12 22:55:33,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1868980.0, ans=0.125 2024-08-12 22:55:40,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13000, loss[loss=0.09185, beats_loss=0.0123, ecapa_loss=0.000167, whisper_loss=0.07788, over 22127.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001695, whisper_loss=0.09213, over 3871458.64 frames. ], batch size: 93, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:55:41,656 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 22:55:48,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1869080.0, ans=0.09899494936611666 2024-08-12 22:56:00,124 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-12 22:56:02,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=12.0 2024-08-12 22:56:04,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1869180.0, ans=0.2 2024-08-12 22:56:09,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.487e+01 2.816e+01 3.426e+01 7.138e+01, threshold=5.633e+01, percent-clipped=2.0 2024-08-12 22:56:21,227 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 22:56:33,196 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 22:56:46,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13050, loss[loss=0.1181, beats_loss=0.009582, ecapa_loss=0.0001807, whisper_loss=0.1068, over 16930.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001705, whisper_loss=0.09143, over 3872782.62 frames. ], batch size: 65, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:57:06,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1869680.0, ans=0.125 2024-08-12 22:57:19,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1869780.0, ans=0.125 2024-08-12 22:57:20,744 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 22:57:32,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1869880.0, ans=0.0 2024-08-12 22:57:44,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1869980.0, ans=0.125 2024-08-12 22:57:47,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1869980.0, ans=0.0 2024-08-12 22:57:49,731 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 22:57:52,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1870080.0, ans=0.04949747468305833 2024-08-12 22:57:53,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13100, loss[loss=0.1111, beats_loss=0.007981, ecapa_loss=0.00022, whisper_loss=0.1009, over 18635.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01093, ecapa_loss=0.0001708, whisper_loss=0.09113, over 3869144.64 frames. ], batch size: 75, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:58:05,161 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 22:58:23,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.487e+01 2.739e+01 3.111e+01 4.282e+01, threshold=5.479e+01, percent-clipped=0.0 2024-08-12 22:59:00,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13150, loss[loss=0.09795, beats_loss=0.011, ecapa_loss=0.0001909, whisper_loss=0.08505, over 22619.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.0001695, whisper_loss=0.09143, over 3861273.37 frames. ], batch size: 93, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:59:03,149 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 22:59:14,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1870680.0, ans=0.2 2024-08-12 22:59:19,343 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 22:59:29,948 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 22:59:35,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2024-08-12 22:59:38,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1870780.0, ans=0.125 2024-08-12 22:59:44,847 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 22:59:56,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1870980.0, ans=0.2 2024-08-12 22:59:59,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2024-08-12 23:00:06,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13200, loss[loss=0.09462, beats_loss=0.009338, ecapa_loss=0.0001682, whisper_loss=0.0836, over 15419.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001696, whisper_loss=0.09177, over 3880082.43 frames. ], batch size: 61, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:00:06,810 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 23:00:16,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1871080.0, ans=0.125 2024-08-12 23:00:30,348 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 23:00:36,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.560e+01 2.764e+01 3.178e+01 9.126e+01, threshold=5.529e+01, percent-clipped=1.0 2024-08-12 23:00:42,329 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 23:00:49,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1871380.0, ans=0.125 2024-08-12 23:00:50,177 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 23:00:51,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1871380.0, ans=0.1 2024-08-12 23:00:56,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1871380.0, ans=0.125 2024-08-12 23:01:07,547 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 23:01:12,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13250, loss[loss=0.09999, beats_loss=0.01123, ecapa_loss=0.0001412, whisper_loss=0.08734, over 18946.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001708, whisper_loss=0.09145, over 3866379.37 frames. ], batch size: 73, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:01:17,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1871580.0, ans=0.125 2024-08-12 23:01:17,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-08-12 23:01:21,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1871580.0, ans=0.125 2024-08-12 23:01:21,557 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.933e-02 2024-08-12 23:01:21,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1871580.0, ans=0.125 2024-08-12 23:01:28,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1871680.0, ans=0.0 2024-08-12 23:01:29,429 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 23:01:48,592 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 23:01:50,319 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:01:51,279 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 23:01:57,977 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-12 23:02:00,484 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-12 23:02:03,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1871880.0, ans=0.125 2024-08-12 23:02:20,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13300, loss[loss=0.1053, beats_loss=0.01114, ecapa_loss=0.0001819, whisper_loss=0.09234, over 22370.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001701, whisper_loss=0.09146, over 3881615.87 frames. ], batch size: 92, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:02:37,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1872180.0, ans=0.09899494936611666 2024-08-12 23:02:44,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1872180.0, ans=0.125 2024-08-12 23:02:45,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1872180.0, ans=0.5 2024-08-12 23:02:51,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2024-08-12 23:02:52,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.491e+01 2.756e+01 2.982e+01 7.499e+01, threshold=5.512e+01, percent-clipped=1.0 2024-08-12 23:02:53,863 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 23:02:57,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1872280.0, ans=0.125 2024-08-12 23:03:04,821 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-12 23:03:28,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13350, loss[loss=0.09783, beats_loss=0.01286, ecapa_loss=0.0001745, whisper_loss=0.08323, over 21688.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001694, whisper_loss=0.09188, over 3902559.50 frames. ], batch size: 91, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:03:34,142 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 23:03:34,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1872580.0, ans=0.05 2024-08-12 23:03:39,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1872580.0, ans=0.04949747468305833 2024-08-12 23:03:45,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1872680.0, ans=0.0 2024-08-12 23:03:55,405 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 23:03:58,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1872780.0, ans=0.125 2024-08-12 23:03:59,476 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 23:04:15,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2024-08-12 23:04:21,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1872980.0, ans=0.2 2024-08-12 23:04:35,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13400, loss[loss=0.09533, beats_loss=0.009919, ecapa_loss=0.0001719, whisper_loss=0.08369, over 14164.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001701, whisper_loss=0.09246, over 3881868.12 frames. ], batch size: 55, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:04:47,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1873180.0, ans=0.04949747468305833 2024-08-12 23:05:05,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1873280.0, ans=0.125 2024-08-12 23:05:06,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.402e+01 2.808e+01 3.201e+01 5.167e+01, threshold=5.616e+01, percent-clipped=0.0 2024-08-12 23:05:11,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1873280.0, ans=0.04949747468305833 2024-08-12 23:05:18,256 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-12 23:05:33,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=21.12 vs. limit=22.5 2024-08-12 23:05:41,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13450, loss[loss=0.1204, beats_loss=0.01029, ecapa_loss=0.000194, whisper_loss=0.1082, over 22276.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01088, ecapa_loss=0.0001709, whisper_loss=0.09213, over 3891072.49 frames. ], batch size: 93, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:05:41,590 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 23:05:51,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1873580.0, ans=0.0 2024-08-12 23:05:53,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1873580.0, ans=0.1 2024-08-12 23:06:00,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-12 23:06:00,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2024-08-12 23:06:04,724 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 23:06:29,624 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 23:06:40,395 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 23:06:44,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1873980.0, ans=0.125 2024-08-12 23:06:48,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13500, loss[loss=0.1155, beats_loss=0.007494, ecapa_loss=0.0002323, whisper_loss=0.1057, over 15193.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001706, whisper_loss=0.09209, over 3893749.08 frames. ], batch size: 61, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:06:57,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-08-12 23:06:58,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-12 23:07:03,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1874180.0, ans=0.125 2024-08-12 23:07:10,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1874180.0, ans=0.125 2024-08-12 23:07:19,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.452e+01 2.723e+01 3.030e+01 4.696e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 23:07:19,637 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 23:07:40,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2024-08-12 23:07:55,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13550, loss[loss=0.1167, beats_loss=0.01044, ecapa_loss=0.0002323, whisper_loss=0.1039, over 14091.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.011, ecapa_loss=0.0001708, whisper_loss=0.09152, over 3874076.85 frames. ], batch size: 60, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:08:13,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1874680.0, ans=0.0 2024-08-12 23:08:20,576 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 23:08:37,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1874880.0, ans=0.05 2024-08-12 23:08:41,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1874880.0, ans=0.95 2024-08-12 23:08:50,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2024-08-12 23:09:02,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13600, loss[loss=0.09389, beats_loss=0.01389, ecapa_loss=0.0001399, whisper_loss=0.0786, over 22153.00 frames. ], tot_loss[loss=0.104, beats_loss=0.011, ecapa_loss=0.0001704, whisper_loss=0.09133, over 3857530.35 frames. ], batch size: 90, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:09:06,236 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 23:09:29,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1875280.0, ans=0.125 2024-08-12 23:09:32,502 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.462e+01 2.883e+01 3.310e+01 7.463e+01, threshold=5.766e+01, percent-clipped=1.0 2024-08-12 23:09:38,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1875280.0, ans=0.125 2024-08-12 23:09:45,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.12 vs. limit=10.0 2024-08-12 23:09:45,801 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 23:09:50,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1875380.0, ans=0.125 2024-08-12 23:09:57,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1875480.0, ans=0.0 2024-08-12 23:10:07,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13650, loss[loss=0.1056, beats_loss=0.01207, ecapa_loss=0.0001822, whisper_loss=0.09171, over 23020.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01111, ecapa_loss=0.0001701, whisper_loss=0.09083, over 3876564.06 frames. ], batch size: 95, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:10:07,408 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 23:10:11,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=15.0 2024-08-12 23:10:14,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1875580.0, ans=0.125 2024-08-12 23:10:26,180 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 23:10:26,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1875680.0, ans=0.125 2024-08-12 23:10:30,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1875680.0, ans=0.0 2024-08-12 23:10:33,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1875780.0, ans=0.125 2024-08-12 23:10:34,374 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 23:10:36,756 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 23:10:51,458 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 23:11:11,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-08-12 23:11:11,767 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 23:11:14,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13700, loss[loss=0.1192, beats_loss=0.01019, ecapa_loss=0.0001816, whisper_loss=0.1072, over 20393.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0111, ecapa_loss=0.0001703, whisper_loss=0.09076, over 3866581.39 frames. ], batch size: 80, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:11:20,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1876080.0, ans=0.125 2024-08-12 23:11:26,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1876080.0, ans=0.125 2024-08-12 23:11:30,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2024-08-12 23:11:44,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.467e+01 2.777e+01 3.137e+01 6.258e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-12 23:11:53,331 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 40 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 23:11:56,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1876380.0, ans=0.0 2024-08-12 23:12:00,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-12 23:12:21,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13750, loss[loss=0.1199, beats_loss=0.01133, ecapa_loss=0.0001939, whisper_loss=0.1066, over 20497.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01102, ecapa_loss=0.0001699, whisper_loss=0.09116, over 3875999.01 frames. ], batch size: 83, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:12:29,112 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 23:12:37,971 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 23:12:41,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1876680.0, ans=0.125 2024-08-12 23:13:00,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.85 vs. limit=22.5 2024-08-12 23:13:08,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1876880.0, ans=0.125 2024-08-12 23:13:16,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1876980.0, ans=0.125 2024-08-12 23:13:21,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1876980.0, ans=0.125 2024-08-12 23:13:27,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1876980.0, ans=0.125 2024-08-12 23:13:28,383 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 23:13:31,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13800, loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001822, whisper_loss=0.09234, over 21545.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.011, ecapa_loss=0.0001705, whisper_loss=0.0909, over 3851762.99 frames. ], batch size: 89, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:13:41,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1877080.0, ans=0.0 2024-08-12 23:13:43,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1877080.0, ans=0.125 2024-08-12 23:13:44,575 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 23:13:50,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1877180.0, ans=0.125 2024-08-12 23:13:51,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1877180.0, ans=0.2 2024-08-12 23:14:04,113 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 23:14:06,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.452e+01 2.663e+01 3.049e+01 4.287e+01, threshold=5.326e+01, percent-clipped=0.0 2024-08-12 23:14:10,136 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 39 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 23:14:18,059 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 23:14:20,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1877380.0, ans=0.0 2024-08-12 23:14:31,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1877480.0, ans=0.125 2024-08-12 23:14:47,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13850, loss[loss=0.09465, beats_loss=0.01118, ecapa_loss=0.0001991, whisper_loss=0.08148, over 21948.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01101, ecapa_loss=0.0001689, whisper_loss=0.09127, over 3872104.86 frames. ], batch size: 94, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:14:54,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1877580.0, ans=0.125 2024-08-12 23:15:19,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1877780.0, ans=0.0 2024-08-12 23:15:24,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1877780.0, ans=0.125 2024-08-12 23:15:31,653 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 23:15:37,877 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 23:15:53,057 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 23:15:54,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1877980.0, ans=0.0 2024-08-12 23:16:04,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13900, loss[loss=0.1058, beats_loss=0.009012, ecapa_loss=0.0002221, whisper_loss=0.09457, over 20674.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.0001701, whisper_loss=0.09199, over 3883143.77 frames. ], batch size: 84, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:16:20,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1878180.0, ans=0.125 2024-08-12 23:16:27,855 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 23:16:39,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.486e+01 2.775e+01 2.978e+01 4.704e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 23:16:49,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1878380.0, ans=0.0 2024-08-12 23:16:56,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2024-08-12 23:17:19,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 13950, loss[loss=0.09185, beats_loss=0.01258, ecapa_loss=0.0002077, whisper_loss=0.0772, over 18534.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01093, ecapa_loss=0.0001691, whisper_loss=0.09212, over 3863160.62 frames. ], batch size: 78, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:18:07,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1878880.0, ans=0.0 2024-08-12 23:18:11,387 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-12 23:18:35,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14000, loss[loss=0.09997, beats_loss=0.01205, ecapa_loss=0.0001762, whisper_loss=0.08616, over 20396.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01088, ecapa_loss=0.0001684, whisper_loss=0.09266, over 3858666.64 frames. ], batch size: 84, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:18:36,867 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 23:18:41,945 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:18:50,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1879180.0, ans=0.125 2024-08-12 23:19:03,935 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 23:19:09,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.518e+01 2.898e+01 3.200e+01 5.053e+01, threshold=5.795e+01, percent-clipped=0.0 2024-08-12 23:19:26,434 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-12 23:19:31,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1879380.0, ans=0.1 2024-08-12 23:19:32,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1879380.0, ans=0.125 2024-08-12 23:19:43,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1879480.0, ans=0.0 2024-08-12 23:19:51,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14050, loss[loss=0.1107, beats_loss=0.01092, ecapa_loss=0.0001169, whisper_loss=0.09861, over 23467.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001679, whisper_loss=0.09209, over 3860252.21 frames. ], batch size: 89, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:19:53,137 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 23:19:56,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1879580.0, ans=0.125 2024-08-12 23:19:57,933 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 23:20:12,296 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 23:20:15,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1879680.0, ans=0.0 2024-08-12 23:20:25,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1879780.0, ans=0.0 2024-08-12 23:20:50,722 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:21:01,289 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:21:06,305 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 23:21:08,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14100, loss[loss=0.09846, beats_loss=0.00941, ecapa_loss=0.0001582, whisper_loss=0.08747, over 18126.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01094, ecapa_loss=0.0001686, whisper_loss=0.09194, over 3818685.56 frames. ], batch size: 70, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:21:44,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.402e+01 2.759e+01 3.024e+01 5.678e+01, threshold=5.519e+01, percent-clipped=0.0 2024-08-12 23:21:48,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1880280.0, ans=0.1 2024-08-12 23:21:53,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1880380.0, ans=0.125 2024-08-12 23:22:24,720 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-12 23:22:26,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1880580.0, ans=0.125 2024-08-12 23:22:27,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14150, loss[loss=0.1071, beats_loss=0.01141, ecapa_loss=0.0001391, whisper_loss=0.09427, over 23667.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01101, ecapa_loss=0.0001692, whisper_loss=0.09125, over 3830733.61 frames. ], batch size: 92, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:22:53,132 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 23:23:36,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-12 23:23:40,998 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 23:23:45,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1881080.0, ans=0.0 2024-08-12 23:23:46,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14200, loss[loss=0.11, beats_loss=0.01198, ecapa_loss=0.0001684, whisper_loss=0.09634, over 22793.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001684, whisper_loss=0.0918, over 3862522.06 frames. ], batch size: 90, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:23:50,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1881080.0, ans=0.125 2024-08-12 23:24:07,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1881180.0, ans=0.1 2024-08-12 23:24:24,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.554e+01 2.881e+01 3.378e+01 7.854e+01, threshold=5.762e+01, percent-clipped=3.0 2024-08-12 23:24:46,449 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 23:25:05,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1881480.0, ans=0.125 2024-08-12 23:25:07,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14250, loss[loss=0.1109, beats_loss=0.00909, ecapa_loss=0.0001907, whisper_loss=0.09988, over 16872.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001691, whisper_loss=0.09188, over 3887285.85 frames. ], batch size: 69, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:25:19,546 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 23:25:23,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1881680.0, ans=0.125 2024-08-12 23:25:25,686 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 23:25:37,472 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 23:25:49,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-08-12 23:25:49,721 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 23:25:50,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1881780.0, ans=0.2 2024-08-12 23:25:52,668 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 23:25:58,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1881880.0, ans=0.125 2024-08-12 23:26:21,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1881980.0, ans=0.125 2024-08-12 23:26:23,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1882080.0, ans=0.95 2024-08-12 23:26:23,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1882080.0, ans=0.05 2024-08-12 23:26:24,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14300, loss[loss=0.1044, beats_loss=0.0122, ecapa_loss=0.0001481, whisper_loss=0.09076, over 21825.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01096, ecapa_loss=0.0001687, whisper_loss=0.09174, over 3885018.25 frames. ], batch size: 87, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:26:26,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-12 23:26:36,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-08-12 23:26:54,257 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 23:26:56,883 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 23:26:58,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.532e+01 2.791e+01 3.195e+01 4.924e+01, threshold=5.583e+01, percent-clipped=0.0 2024-08-12 23:27:12,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1882380.0, ans=0.2 2024-08-12 23:27:16,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1882380.0, ans=0.125 2024-08-12 23:27:19,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1882380.0, ans=0.125 2024-08-12 23:27:29,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1882480.0, ans=0.0 2024-08-12 23:27:39,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14350, loss[loss=0.1319, beats_loss=0.009248, ecapa_loss=0.0001957, whisper_loss=0.1207, over 16563.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.000168, whisper_loss=0.09121, over 3900425.25 frames. ], batch size: 66, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:27:53,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1882580.0, ans=0.125 2024-08-12 23:28:14,346 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 23:28:34,573 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-12 23:28:47,284 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 23:28:58,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14400, loss[loss=0.08939, beats_loss=0.01505, ecapa_loss=0.0001329, whisper_loss=0.07301, over 24200.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01103, ecapa_loss=0.0001688, whisper_loss=0.0917, over 3913395.41 frames. ], batch size: 98, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:29:17,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1883180.0, ans=0.0 2024-08-12 23:29:17,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1883180.0, ans=0.125 2024-08-12 23:29:20,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1883180.0, ans=0.1 2024-08-12 23:29:33,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.540e+01 2.866e+01 3.197e+01 2.206e+02, threshold=5.732e+01, percent-clipped=2.0 2024-08-12 23:29:36,797 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 23:29:37,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1883280.0, ans=0.125 2024-08-12 23:29:37,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=15.0 2024-08-12 23:30:04,298 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 23:30:14,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 13, batch 14450, loss[loss=0.1326, beats_loss=0.0102, ecapa_loss=0.0001415, whisper_loss=0.121, over 17611.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01112, ecapa_loss=0.0001681, whisper_loss=0.09134, over 3900373.42 frames. ], batch size: 65, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:30:15,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1883580.0, ans=0.125 2024-08-12 23:30:21,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1883580.0, ans=0.125 2024-08-12 23:30:25,687 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 23:30:31,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1883680.0, ans=0.2 2024-08-12 23:30:41,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1883680.0, ans=0.0 2024-08-12 23:30:52,683 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 23:31:03,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1883880.0, ans=0.1 2024-08-12 23:31:05,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1883880.0, ans=0.1 2024-08-12 23:31:54,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 0, loss[loss=0.1098, beats_loss=0.009085, ecapa_loss=0.0001714, whisper_loss=0.09903, over 22567.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.009085, ecapa_loss=0.0001714, whisper_loss=0.09903, over 22567.00 frames. ], batch size: 89, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:31:54,481 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-12 23:32:30,934 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on ASR_libri: loss=0.2554, beats_loss=0, ecapa_loss=0.0005808, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 23:32:47,267 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on SV_voxceleb1: loss=0.004647, beats_loss=0, ecapa_loss=0.0004647, whisper_loss=0, over 939242.00 frames. 2024-08-12 23:34:33,397 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on AT_audioset: loss=0.02401, beats_loss=0.02401, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 23:34:33,400 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-12 23:35:32,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.64 vs. limit=10.0 2024-08-12 23:35:52,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1884290.0, ans=0.1 2024-08-12 23:35:53,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.601e+01 2.897e+01 3.214e+01 1.891e+02, threshold=5.795e+01, percent-clipped=1.0 2024-08-12 23:35:54,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1884290.0, ans=0.0 2024-08-12 23:36:25,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1884390.0, ans=0.1 2024-08-12 23:36:35,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 50, loss[loss=0.08653, beats_loss=0.01034, ecapa_loss=0.0001815, whisper_loss=0.07437, over 17284.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0102, ecapa_loss=0.0001725, whisper_loss=0.08992, over 896729.39 frames. ], batch size: 69, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:36:41,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1884490.0, ans=0.0 2024-08-12 23:36:56,797 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 23:37:08,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1884590.0, ans=0.0 2024-08-12 23:37:55,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1884790.0, ans=0.1 2024-08-12 23:38:03,514 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 23:38:33,705 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 23:38:43,349 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 100, loss[loss=0.132, beats_loss=0.009028, ecapa_loss=0.0001679, whisper_loss=0.1213, over 23920.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01021, ecapa_loss=0.0001716, whisper_loss=0.0896, over 1534369.24 frames. ], batch size: 90, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:38:49,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1884990.0, ans=0.05 2024-08-12 23:38:51,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2024-08-12 23:39:24,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.59 vs. limit=22.5 2024-08-12 23:40:00,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1885190.0, ans=0.07 2024-08-12 23:40:05,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1885190.0, ans=0.0 2024-08-12 23:40:24,787 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.825e+01 3.064e+01 3.241e+01 4.540e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 23:40:45,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1885290.0, ans=0.125 2024-08-12 23:40:48,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1885390.0, ans=10.0 2024-08-12 23:41:02,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-08-12 23:41:11,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 150, loss[loss=0.1146, beats_loss=0.01089, ecapa_loss=0.000155, whisper_loss=0.1022, over 20504.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.009923, ecapa_loss=0.0001753, whisper_loss=0.0916, over 2034388.19 frames. ], batch size: 77, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:41:25,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1885490.0, ans=0.05 2024-08-12 23:41:40,648 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 23:42:04,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1885690.0, ans=0.05 2024-08-12 23:42:32,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1885790.0, ans=0.125 2024-08-12 23:42:33,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-12 23:42:41,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1885790.0, ans=0.125 2024-08-12 23:42:51,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1885790.0, ans=0.125 2024-08-12 23:43:08,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1885890.0, ans=0.0 2024-08-12 23:43:23,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 200, loss[loss=0.08899, beats_loss=0.01334, ecapa_loss=0.0001206, whisper_loss=0.07445, over 16858.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01012, ecapa_loss=0.0001749, whisper_loss=0.09156, over 2395675.48 frames. ], batch size: 66, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:43:39,349 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 15 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 23:43:43,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1885990.0, ans=0.125 2024-08-12 23:43:53,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2024-08-12 23:44:02,940 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 23:44:16,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1886190.0, ans=0.125 2024-08-12 23:44:16,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1886190.0, ans=0.0 2024-08-12 23:44:22,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1886190.0, ans=0.125 2024-08-12 23:44:40,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.587e+01 2.870e+01 3.355e+01 1.552e+02, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 23:44:55,825 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 23:45:13,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1886390.0, ans=0.125 2024-08-12 23:45:26,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 250, loss[loss=0.09284, beats_loss=0.0098, ecapa_loss=0.0001383, whisper_loss=0.08166, over 15359.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01031, ecapa_loss=0.0001726, whisper_loss=0.09122, over 2721066.75 frames. ], batch size: 57, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:45:26,742 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 23:46:15,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1886590.0, ans=0.125 2024-08-12 23:46:19,718 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 23:46:46,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2024-08-12 23:46:58,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1886790.0, ans=0.125 2024-08-12 23:47:11,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1886890.0, ans=0.09899494936611666 2024-08-12 23:47:18,278 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-12 23:47:21,570 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 15 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 23:47:27,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 300, loss[loss=0.1059, beats_loss=0.01056, ecapa_loss=0.0001862, whisper_loss=0.0935, over 17024.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.0001715, whisper_loss=0.0913, over 2966893.52 frames. ], batch size: 69, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:47:33,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-08-12 23:47:40,068 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 23:47:41,711 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 23:47:42,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-08-12 23:47:44,968 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 23:48:10,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1887190.0, ans=0.1 2024-08-12 23:48:16,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.332e+01 2.692e+01 3.047e+01 7.964e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-12 23:48:20,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2024-08-12 23:48:20,927 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 23:48:44,166 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 350, loss[loss=0.1126, beats_loss=0.01095, ecapa_loss=0.0001555, whisper_loss=0.1001, over 15935.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001709, whisper_loss=0.09107, over 3157616.47 frames. ], batch size: 61, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:49:01,653 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 23:49:01,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1887590.0, ans=0.125 2024-08-12 23:49:14,876 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 23:49:15,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1887690.0, ans=0.2 2024-08-12 23:49:18,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1887690.0, ans=0.1 2024-08-12 23:49:24,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1887690.0, ans=0.0 2024-08-12 23:49:59,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 400, loss[loss=0.09002, beats_loss=0.01317, ecapa_loss=0.0001171, whisper_loss=0.07568, over 19079.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01061, ecapa_loss=0.0001691, whisper_loss=0.09134, over 3298408.73 frames. ], batch size: 75, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:50:07,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1887990.0, ans=0.125 2024-08-12 23:50:08,157 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-12 23:50:22,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1888090.0, ans=0.125 2024-08-12 23:50:22,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1888090.0, ans=0.2 2024-08-12 23:50:33,064 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-12 23:50:46,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1888290.0, ans=0.1 2024-08-12 23:50:47,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1888290.0, ans=0.09899494936611666 2024-08-12 23:50:51,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.354e+01 2.624e+01 3.158e+01 4.755e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 23:50:54,662 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 23:50:57,483 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 23:50:59,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=1888290.0, ans=12.0 2024-08-12 23:51:17,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 450, loss[loss=0.09527, beats_loss=0.01073, ecapa_loss=0.0001718, whisper_loss=0.08283, over 21135.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001679, whisper_loss=0.09145, over 3410278.88 frames. ], batch size: 80, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:51:21,611 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-12 23:52:00,108 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 23:52:04,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1888790.0, ans=0.125 2024-08-12 23:52:10,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1888790.0, ans=0.0 2024-08-12 23:52:13,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1888790.0, ans=0.0 2024-08-12 23:52:19,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1888890.0, ans=0.2 2024-08-12 23:52:23,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1888890.0, ans=0.125 2024-08-12 23:52:26,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1888890.0, ans=0.125 2024-08-12 23:52:33,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 500, loss[loss=0.1137, beats_loss=0.009346, ecapa_loss=0.0001797, whisper_loss=0.1026, over 18301.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001672, whisper_loss=0.09146, over 3501119.47 frames. ], batch size: 71, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:52:37,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=15.0 2024-08-12 23:52:54,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1889090.0, ans=0.0 2024-08-12 23:53:00,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1889090.0, ans=0.0 2024-08-12 23:53:10,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1889190.0, ans=0.0 2024-08-12 23:53:24,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.385e+01 2.695e+01 3.088e+01 5.680e+01, threshold=5.390e+01, percent-clipped=1.0 2024-08-12 23:53:39,817 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 23:53:51,187 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 550, loss[loss=0.09939, beats_loss=0.01007, ecapa_loss=0.0001545, whisper_loss=0.08777, over 18152.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001667, whisper_loss=0.0911, over 3611894.88 frames. ], batch size: 65, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:54:01,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1889490.0, ans=0.125 2024-08-12 23:54:20,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-12 23:54:43,576 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 23:54:45,141 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 23:55:05,477 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 23:55:06,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 600, loss[loss=0.1115, beats_loss=0.012, ecapa_loss=0.000141, whisper_loss=0.09812, over 23647.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001657, whisper_loss=0.09165, over 3678794.97 frames. ], batch size: 91, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:55:11,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1889990.0, ans=0.125 2024-08-12 23:55:27,600 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 23:55:29,026 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 23:55:35,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1890190.0, ans=0.125 2024-08-12 23:55:37,853 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 23:55:54,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.472e+01 2.658e+01 3.015e+01 7.457e+01, threshold=5.315e+01, percent-clipped=1.0 2024-08-12 23:56:01,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1890290.0, ans=0.125 2024-08-12 23:56:20,242 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 650, loss[loss=0.09764, beats_loss=0.01135, ecapa_loss=0.0001448, whisper_loss=0.08484, over 18160.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001642, whisper_loss=0.09072, over 3740227.93 frames. ], batch size: 72, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:56:38,973 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 23:56:48,267 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 23:56:53,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1890690.0, ans=0.125 2024-08-12 23:56:59,688 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 23:57:02,709 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 23:57:18,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1890790.0, ans=0.125 2024-08-12 23:57:27,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1890890.0, ans=0.125 2024-08-12 23:57:36,217 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 700, loss[loss=0.1035, beats_loss=0.01018, ecapa_loss=0.0001629, whisper_loss=0.09171, over 23207.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001655, whisper_loss=0.09102, over 3755202.53 frames. ], batch size: 91, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:57:42,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1890990.0, ans=0.0 2024-08-12 23:57:45,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1890990.0, ans=0.125 2024-08-12 23:57:50,822 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 23:57:56,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1891090.0, ans=0.1 2024-08-12 23:58:07,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1891190.0, ans=0.125 2024-08-12 23:58:09,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1891190.0, ans=0.125 2024-08-12 23:58:17,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1891190.0, ans=0.0 2024-08-12 23:58:24,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.432e+01 2.727e+01 3.024e+01 4.665e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 23:58:24,599 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 23:58:26,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1891290.0, ans=0.0 2024-08-12 23:58:28,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-12 23:58:32,111 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 23:58:33,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1891390.0, ans=0.0 2024-08-12 23:58:34,812 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 23:58:37,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-08-12 23:58:46,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1891390.0, ans=0.125 2024-08-12 23:58:47,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-12 23:58:49,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 750, loss[loss=0.1237, beats_loss=0.008888, ecapa_loss=0.0001891, whisper_loss=0.1129, over 21551.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001643, whisper_loss=0.09162, over 3781595.06 frames. ], batch size: 88, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:59:00,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1891490.0, ans=0.125 2024-08-12 23:59:00,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1891490.0, ans=0.125 2024-08-12 23:59:08,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1891590.0, ans=0.04949747468305833 2024-08-12 23:59:10,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1891590.0, ans=0.2 2024-08-12 23:59:14,264 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 23:59:46,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1891790.0, ans=0.1 2024-08-12 23:59:53,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2024-08-12 23:59:56,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2024-08-12 23:59:58,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1891890.0, ans=0.125 2024-08-12 23:59:59,435 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 23:59:59,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1891890.0, ans=0.0 2024-08-13 00:00:03,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 800, loss[loss=0.06808, beats_loss=0.01321, ecapa_loss=0.0001459, whisper_loss=0.05342, over 19906.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001663, whisper_loss=0.09089, over 3760474.94 frames. ], batch size: 80, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:00:13,511 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 00:00:27,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1892090.0, ans=0.0 2024-08-13 00:00:37,704 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2024-08-13 00:00:52,143 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 00:00:54,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.376e+01 2.556e+01 2.956e+01 7.880e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-13 00:01:02,433 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-13 00:01:08,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1892390.0, ans=0.2 2024-08-13 00:01:10,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1892390.0, ans=0.125 2024-08-13 00:01:10,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-13 00:01:13,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1892390.0, ans=0.0 2024-08-13 00:01:16,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2024-08-13 00:01:19,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 850, loss[loss=0.12, beats_loss=0.009383, ecapa_loss=0.0001324, whisper_loss=0.1092, over 20616.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01081, ecapa_loss=0.0001646, whisper_loss=0.09004, over 3774834.59 frames. ], batch size: 75, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:01:23,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1892490.0, ans=0.125 2024-08-13 00:01:23,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-13 00:01:34,373 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 00:01:58,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1892690.0, ans=0.125 2024-08-13 00:02:11,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1892790.0, ans=0.0 2024-08-13 00:02:11,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1892790.0, ans=0.1 2024-08-13 00:02:15,279 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-13 00:02:31,995 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 900, loss[loss=0.1088, beats_loss=0.01072, ecapa_loss=0.0001522, whisper_loss=0.09652, over 19250.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001651, whisper_loss=0.09031, over 3753975.65 frames. ], batch size: 75, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:02:34,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1892990.0, ans=0.0 2024-08-13 00:02:36,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1892990.0, ans=0.0 2024-08-13 00:02:51,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1893090.0, ans=0.0 2024-08-13 00:03:04,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1893190.0, ans=0.1 2024-08-13 00:03:18,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1893290.0, ans=0.0 2024-08-13 00:03:19,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.22 vs. limit=10.0 2024-08-13 00:03:19,460 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.408e+01 2.662e+01 2.977e+01 4.425e+01, threshold=5.325e+01, percent-clipped=0.0 2024-08-13 00:03:23,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-08-13 00:03:25,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1893290.0, ans=0.125 2024-08-13 00:03:39,355 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:03:40,294 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 00:03:41,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1893390.0, ans=0.125 2024-08-13 00:03:44,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 950, loss[loss=0.09202, beats_loss=0.0113, ecapa_loss=0.0001605, whisper_loss=0.07911, over 16646.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01082, ecapa_loss=0.0001645, whisper_loss=0.09007, over 3769370.72 frames. ], batch size: 67, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:03:45,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-13 00:03:51,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1893490.0, ans=0.125 2024-08-13 00:03:58,302 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 00:04:14,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1893690.0, ans=0.0 2024-08-13 00:04:17,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1893690.0, ans=0.2 2024-08-13 00:04:26,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1893690.0, ans=0.2 2024-08-13 00:04:28,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1893690.0, ans=0.125 2024-08-13 00:04:34,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1893790.0, ans=0.2 2024-08-13 00:04:43,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-08-13 00:04:44,134 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 00:04:51,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-08-13 00:04:53,498 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 00:04:59,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1000, loss[loss=0.09946, beats_loss=0.01146, ecapa_loss=0.0001532, whisper_loss=0.08647, over 21815.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01081, ecapa_loss=0.0001653, whisper_loss=0.09001, over 3785134.18 frames. ], batch size: 87, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:05:19,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1894090.0, ans=0.125 2024-08-13 00:05:39,641 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 00:05:40,989 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 00:05:45,387 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 00:05:48,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.405e+01 2.688e+01 3.061e+01 4.317e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 00:05:49,649 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 00:05:50,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=12.0 2024-08-13 00:05:54,694 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 00:05:56,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1894290.0, ans=0.025 2024-08-13 00:05:59,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-13 00:06:13,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1050, loss[loss=0.09871, beats_loss=0.01462, ecapa_loss=0.0001419, whisper_loss=0.08267, over 23572.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01086, ecapa_loss=0.0001642, whisper_loss=0.09009, over 3818884.63 frames. ], batch size: 93, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:06:19,555 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 00:06:27,839 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 00:07:33,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1894990.0, ans=0.2 2024-08-13 00:07:34,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1100, loss[loss=0.1058, beats_loss=0.008554, ecapa_loss=0.0001785, whisper_loss=0.09545, over 18520.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.000165, whisper_loss=0.09059, over 3823756.06 frames. ], batch size: 69, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:07:44,397 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 00:08:18,033 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.388e+01 2024-08-13 00:08:23,799 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 26 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-13 00:08:25,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.502e+01 2.869e+01 3.346e+01 6.186e+01, threshold=5.739e+01, percent-clipped=2.0 2024-08-13 00:08:28,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2024-08-13 00:08:41,200 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 00:08:43,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1895390.0, ans=0.125 2024-08-13 00:08:46,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1895390.0, ans=0.125 2024-08-13 00:08:49,325 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 00:08:51,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1150, loss[loss=0.1083, beats_loss=0.01076, ecapa_loss=0.0001271, whisper_loss=0.09629, over 23834.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.000164, whisper_loss=0.09129, over 3853479.61 frames. ], batch size: 90, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:08:59,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1895490.0, ans=0.2 2024-08-13 00:09:00,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1895490.0, ans=0.125 2024-08-13 00:09:04,420 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 00:09:04,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1895490.0, ans=0.0 2024-08-13 00:09:11,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-13 00:09:35,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1895690.0, ans=0.125 2024-08-13 00:09:43,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1895790.0, ans=0.0 2024-08-13 00:10:10,386 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1200, loss[loss=0.1206, beats_loss=0.0103, ecapa_loss=0.00014, whisper_loss=0.1089, over 15958.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001648, whisper_loss=0.09126, over 3876158.85 frames. ], batch size: 56, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:10:10,614 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-13 00:10:18,073 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 00:10:22,414 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 00:10:44,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-13 00:10:54,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2024-08-13 00:11:06,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.344e+01 2.617e+01 3.051e+01 6.950e+01, threshold=5.235e+01, percent-clipped=1.0 2024-08-13 00:11:08,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.40 vs. limit=10.0 2024-08-13 00:11:17,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1896390.0, ans=0.5 2024-08-13 00:11:26,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1896390.0, ans=0.1 2024-08-13 00:11:26,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2024-08-13 00:11:31,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1250, loss[loss=0.07939, beats_loss=0.01349, ecapa_loss=0.0001351, whisper_loss=0.06455, over 19918.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01093, ecapa_loss=0.0001626, whisper_loss=0.08995, over 3860121.04 frames. ], batch size: 78, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:11:41,401 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:12:00,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1896590.0, ans=0.0 2024-08-13 00:12:09,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1896690.0, ans=0.1 2024-08-13 00:12:12,013 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 00:12:25,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1896790.0, ans=0.025 2024-08-13 00:12:49,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1896990.0, ans=0.125 2024-08-13 00:12:50,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1300, loss[loss=0.106, beats_loss=0.0129, ecapa_loss=0.0001809, whisper_loss=0.0913, over 19498.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01098, ecapa_loss=0.0001631, whisper_loss=0.08996, over 3888247.76 frames. ], batch size: 83, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:12:50,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1896990.0, ans=0.0 2024-08-13 00:13:03,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1896990.0, ans=0.125 2024-08-13 00:13:24,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1897190.0, ans=0.2 2024-08-13 00:13:26,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1897190.0, ans=0.125 2024-08-13 00:13:34,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1897190.0, ans=0.0 2024-08-13 00:13:34,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1897190.0, ans=0.0 2024-08-13 00:13:34,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1897190.0, ans=0.125 2024-08-13 00:13:43,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.447e+01 2.732e+01 3.060e+01 1.003e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-13 00:13:49,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-08-13 00:13:59,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1897390.0, ans=0.125 2024-08-13 00:14:10,646 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 00:14:12,430 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1350, loss[loss=0.1129, beats_loss=0.009912, ecapa_loss=0.0001678, whisper_loss=0.1014, over 21450.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01099, ecapa_loss=0.0001639, whisper_loss=0.08927, over 3880704.53 frames. ], batch size: 84, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:14:18,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2024-08-13 00:14:22,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1897490.0, ans=0.0 2024-08-13 00:14:23,931 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 00:14:32,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1897590.0, ans=0.0 2024-08-13 00:14:35,960 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 00:14:36,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1897590.0, ans=0.0 2024-08-13 00:14:37,274 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 00:14:47,440 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 00:14:50,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1897690.0, ans=0.0 2024-08-13 00:15:03,785 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 00:15:06,706 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 00:15:23,833 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 00:15:32,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1400, loss[loss=0.1188, beats_loss=0.00957, ecapa_loss=0.0001904, whisper_loss=0.1074, over 18210.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0109, ecapa_loss=0.0001647, whisper_loss=0.0898, over 3876882.80 frames. ], batch size: 71, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:16:11,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1898190.0, ans=0.0 2024-08-13 00:16:12,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1898190.0, ans=0.125 2024-08-13 00:16:25,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.414e+01 2.708e+01 3.137e+01 5.162e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-13 00:16:38,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1898390.0, ans=0.125 2024-08-13 00:16:39,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1898390.0, ans=0.125 2024-08-13 00:16:40,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-13 00:16:54,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1450, loss[loss=0.07703, beats_loss=0.01099, ecapa_loss=0.000161, whisper_loss=0.06443, over 16003.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001637, whisper_loss=0.09029, over 3856002.14 frames. ], batch size: 64, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:17:32,099 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 00:17:42,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1898590.0, ans=0.95 2024-08-13 00:17:44,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1898590.0, ans=0.125 2024-08-13 00:17:48,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1898590.0, ans=0.0 2024-08-13 00:17:58,316 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.281e-01 2024-08-13 00:18:17,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-13 00:18:18,229 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-13 00:18:39,206 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 00:18:41,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1898890.0, ans=0.0 2024-08-13 00:18:43,395 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1500, loss[loss=0.1042, beats_loss=0.009183, ecapa_loss=0.0001658, whisper_loss=0.09334, over 21132.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01084, ecapa_loss=0.0001636, whisper_loss=0.09012, over 3882471.57 frames. ], batch size: 86, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:18:55,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1898990.0, ans=0.2 2024-08-13 00:18:57,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1899090.0, ans=0.125 2024-08-13 00:18:57,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1899090.0, ans=0.5 2024-08-13 00:19:26,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1899190.0, ans=0.125 2024-08-13 00:19:27,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1899190.0, ans=0.125 2024-08-13 00:19:35,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.418e+01 2.688e+01 3.116e+01 4.487e+01, threshold=5.376e+01, percent-clipped=0.0 2024-08-13 00:19:54,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1899390.0, ans=0.0 2024-08-13 00:20:02,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1550, loss[loss=0.1202, beats_loss=0.01006, ecapa_loss=0.0001347, whisper_loss=0.1088, over 24011.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01085, ecapa_loss=0.0001632, whisper_loss=0.09004, over 3896939.22 frames. ], batch size: 91, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:20:04,277 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 00:20:23,434 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 00:20:32,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-13 00:20:58,533 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 00:21:00,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1899790.0, ans=0.125 2024-08-13 00:21:04,119 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 16 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 00:21:20,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1600, loss[loss=0.1111, beats_loss=0.008232, ecapa_loss=0.000185, whisper_loss=0.101, over 14362.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001631, whisper_loss=0.09035, over 3878637.04 frames. ], batch size: 56, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:21:39,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.58 vs. limit=22.5 2024-08-13 00:21:46,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1900090.0, ans=0.025 2024-08-13 00:21:49,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1900090.0, ans=0.125 2024-08-13 00:21:52,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1900190.0, ans=0.125 2024-08-13 00:21:57,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-13 00:22:09,508 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 00:22:12,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.583e+01 2.856e+01 3.340e+01 1.108e+02, threshold=5.712e+01, percent-clipped=2.0 2024-08-13 00:22:17,846 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 00:22:18,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-13 00:22:20,737 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 00:22:27,142 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:22:38,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1650, loss[loss=0.09681, beats_loss=0.01267, ecapa_loss=0.0001306, whisper_loss=0.08283, over 15146.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001613, whisper_loss=0.09083, over 3887852.71 frames. ], batch size: 60, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:22:38,424 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-13 00:22:38,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1900490.0, ans=0.2 2024-08-13 00:22:47,460 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:22:56,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1900590.0, ans=0.1 2024-08-13 00:22:57,483 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 00:23:06,407 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 00:23:07,972 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 00:23:24,383 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 00:23:26,072 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 00:23:30,549 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 00:23:32,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1900790.0, ans=0.035 2024-08-13 00:23:39,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1900890.0, ans=0.125 2024-08-13 00:23:45,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-13 00:23:53,342 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1700, loss[loss=0.08596, beats_loss=0.01087, ecapa_loss=0.0001581, whisper_loss=0.07351, over 15934.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01087, ecapa_loss=0.00016, whisper_loss=0.09098, over 3867412.42 frames. ], batch size: 62, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:23:55,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1900990.0, ans=0.2 2024-08-13 00:24:10,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1901090.0, ans=0.0 2024-08-13 00:24:20,467 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 00:24:39,987 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 00:24:42,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.360e+01 2.688e+01 2.973e+01 4.042e+01, threshold=5.375e+01, percent-clipped=0.0 2024-08-13 00:24:53,470 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 00:24:56,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1901390.0, ans=0.125 2024-08-13 00:24:57,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1901390.0, ans=0.0 2024-08-13 00:25:07,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1750, loss[loss=0.1058, beats_loss=0.008434, ecapa_loss=0.000191, whisper_loss=0.09547, over 21074.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01083, ecapa_loss=0.0001613, whisper_loss=0.09099, over 3873632.50 frames. ], batch size: 82, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:25:08,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2024-08-13 00:25:17,938 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 00:25:19,142 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 00:25:36,244 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 00:25:41,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1901690.0, ans=0.125 2024-08-13 00:25:45,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1901690.0, ans=0.125 2024-08-13 00:25:52,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1901790.0, ans=0.125 2024-08-13 00:26:07,422 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 00:26:19,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1901990.0, ans=0.5 2024-08-13 00:26:20,508 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1800, loss[loss=0.1067, beats_loss=0.01112, ecapa_loss=0.0001664, whisper_loss=0.09395, over 18785.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001614, whisper_loss=0.09069, over 3894061.07 frames. ], batch size: 76, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:26:24,392 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 00:26:33,340 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 00:26:46,258 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-13 00:26:46,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1902090.0, ans=0.125 2024-08-13 00:26:49,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1902090.0, ans=0.1 2024-08-13 00:26:52,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-13 00:27:12,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.455e+01 2.703e+01 3.083e+01 4.143e+01, threshold=5.406e+01, percent-clipped=0.0 2024-08-13 00:27:13,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1902290.0, ans=0.125 2024-08-13 00:27:40,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1850, loss[loss=0.0862, beats_loss=0.009013, ecapa_loss=0.0002159, whisper_loss=0.07503, over 13590.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001625, whisper_loss=0.09087, over 3874828.96 frames. ], batch size: 55, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:27:44,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1902490.0, ans=0.125 2024-08-13 00:27:50,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1902490.0, ans=0.125 2024-08-13 00:27:54,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2024-08-13 00:28:06,395 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 00:28:09,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2024-08-13 00:28:25,892 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 00:28:38,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1902790.0, ans=0.1 2024-08-13 00:28:51,037 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 21 from Vox, 13 fro AS 2024-08-13 00:28:52,894 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 00:29:00,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1900, loss[loss=0.07642, beats_loss=0.0135, ecapa_loss=0.0001754, whisper_loss=0.06117, over 17278.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001626, whisper_loss=0.09066, over 3838007.73 frames. ], batch size: 77, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:29:07,556 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 00:29:29,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1903090.0, ans=0.125 2024-08-13 00:29:33,624 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 00:29:36,971 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-13 00:29:53,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.471e+01 2.746e+01 3.040e+01 5.075e+01, threshold=5.492e+01, percent-clipped=0.0 2024-08-13 00:30:14,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1903390.0, ans=0.125 2024-08-13 00:30:20,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 1950, loss[loss=0.09884, beats_loss=0.01202, ecapa_loss=0.0001378, whisper_loss=0.08544, over 20995.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.000163, whisper_loss=0.09057, over 3832772.38 frames. ], batch size: 82, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:30:36,170 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 00:30:42,585 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 00:30:49,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1903590.0, ans=0.1 2024-08-13 00:31:03,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1903690.0, ans=0.0 2024-08-13 00:31:14,069 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 00:31:18,602 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 00:31:39,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2000, loss[loss=0.08352, beats_loss=0.01126, ecapa_loss=0.0002299, whisper_loss=0.06997, over 15978.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001634, whisper_loss=0.09064, over 3821476.22 frames. ], batch size: 69, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:32:05,458 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 00:32:10,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1904190.0, ans=0.125 2024-08-13 00:32:10,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1904190.0, ans=0.125 2024-08-13 00:32:15,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2024-08-13 00:32:18,333 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 30 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 00:32:18,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-13 00:32:21,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1904190.0, ans=0.025 2024-08-13 00:32:30,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.391e+01 2.734e+01 3.144e+01 4.841e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-13 00:32:32,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-13 00:32:56,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2050, loss[loss=0.1115, beats_loss=0.008136, ecapa_loss=0.0001601, whisper_loss=0.1018, over 16305.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001641, whisper_loss=0.09105, over 3846979.23 frames. ], batch size: 62, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:33:16,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1904590.0, ans=0.125 2024-08-13 00:33:27,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1904690.0, ans=0.1 2024-08-13 00:33:28,595 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 00:33:42,467 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 00:33:58,233 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:34:04,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1904890.0, ans=0.0 2024-08-13 00:34:12,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2100, loss[loss=0.1323, beats_loss=0.008707, ecapa_loss=0.000177, whisper_loss=0.1218, over 22998.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001642, whisper_loss=0.09092, over 3831256.50 frames. ], batch size: 90, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:34:17,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1904990.0, ans=0.125 2024-08-13 00:34:17,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1904990.0, ans=0.0 2024-08-13 00:34:22,290 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 00:34:27,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1905090.0, ans=0.125 2024-08-13 00:34:33,221 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-13 00:34:34,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1905090.0, ans=0.2 2024-08-13 00:34:36,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1905090.0, ans=0.125 2024-08-13 00:34:57,515 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 00:35:03,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.317e+01 2.588e+01 2.864e+01 4.791e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-13 00:35:21,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1905390.0, ans=0.2 2024-08-13 00:35:23,450 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 00:35:29,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2150, loss[loss=0.1137, beats_loss=0.009693, ecapa_loss=0.0001665, whisper_loss=0.1024, over 20384.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.000164, whisper_loss=0.09162, over 3831375.16 frames. ], batch size: 80, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:35:39,485 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 00:36:22,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1905790.0, ans=0.0 2024-08-13 00:36:24,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1905790.0, ans=0.125 2024-08-13 00:36:48,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1905890.0, ans=0.1 2024-08-13 00:36:51,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2200, loss[loss=0.111, beats_loss=0.01047, ecapa_loss=0.0001987, whisper_loss=0.0985, over 21722.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01076, ecapa_loss=0.0001663, whisper_loss=0.0924, over 3837020.19 frames. ], batch size: 90, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:36:53,390 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 00:36:59,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1905990.0, ans=0.125 2024-08-13 00:37:01,490 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 00:37:07,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-13 00:37:45,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.358e+01 2.742e+01 3.274e+01 9.057e+01, threshold=5.483e+01, percent-clipped=3.0 2024-08-13 00:37:48,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1906290.0, ans=0.125 2024-08-13 00:37:59,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-13 00:38:03,871 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 00:38:06,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1906390.0, ans=0.125 2024-08-13 00:38:11,079 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 00:38:13,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2250, loss[loss=0.1098, beats_loss=0.01302, ecapa_loss=0.0001248, whisper_loss=0.09551, over 24136.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01081, ecapa_loss=0.0001672, whisper_loss=0.09266, over 3847011.25 frames. ], batch size: 90, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:38:19,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-13 00:38:54,589 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 00:39:29,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1906890.0, ans=0.125 2024-08-13 00:39:37,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2300, loss[loss=0.1406, beats_loss=0.007968, ecapa_loss=0.0002104, whisper_loss=0.1305, over 22906.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01078, ecapa_loss=0.0001684, whisper_loss=0.09291, over 3893346.50 frames. ], batch size: 89, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:39:48,096 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:39:54,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1907090.0, ans=0.0 2024-08-13 00:40:01,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1907090.0, ans=0.0 2024-08-13 00:40:03,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1907090.0, ans=0.2 2024-08-13 00:40:28,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1907290.0, ans=10.0 2024-08-13 00:40:32,899 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.474e+01 2.795e+01 3.232e+01 6.818e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-13 00:40:42,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-08-13 00:40:47,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1907390.0, ans=0.125 2024-08-13 00:40:58,050 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 12 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 00:41:00,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2350, loss[loss=0.06709, beats_loss=0.01393, ecapa_loss=0.0001426, whisper_loss=0.05173, over 14656.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001698, whisper_loss=0.09211, over 3846511.02 frames. ], batch size: 61, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:41:18,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1907590.0, ans=0.04949747468305833 2024-08-13 00:41:18,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1907590.0, ans=0.125 2024-08-13 00:41:36,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1907690.0, ans=0.2 2024-08-13 00:41:38,930 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 00:41:52,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1907790.0, ans=0.0 2024-08-13 00:42:01,122 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 00:42:02,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-13 00:42:02,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-13 00:42:22,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2400, loss[loss=0.1192, beats_loss=0.008509, ecapa_loss=0.0001715, whisper_loss=0.109, over 17193.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01081, ecapa_loss=0.0001686, whisper_loss=0.09143, over 3830855.52 frames. ], batch size: 65, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:42:25,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1907990.0, ans=0.125 2024-08-13 00:42:32,271 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 00:42:42,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1908090.0, ans=0.125 2024-08-13 00:42:48,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1908090.0, ans=0.125 2024-08-13 00:42:51,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1908090.0, ans=0.125 2024-08-13 00:43:03,884 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 00:43:05,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=22.5 2024-08-13 00:43:13,836 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 00:43:16,952 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.472e+01 2.673e+01 3.015e+01 1.435e+02, threshold=5.346e+01, percent-clipped=1.0 2024-08-13 00:43:28,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1908390.0, ans=0.0 2024-08-13 00:43:45,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2450, loss[loss=0.0911, beats_loss=0.01121, ecapa_loss=0.000162, whisper_loss=0.07828, over 17299.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.000168, whisper_loss=0.0911, over 3849222.17 frames. ], batch size: 70, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:44:07,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2024-08-13 00:44:13,943 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-13 00:44:28,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1908690.0, ans=0.125 2024-08-13 00:44:39,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1908790.0, ans=0.125 2024-08-13 00:44:42,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1908790.0, ans=0.0 2024-08-13 00:45:00,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-13 00:45:04,477 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 00:45:06,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2500, loss[loss=0.09295, beats_loss=0.01211, ecapa_loss=0.0001708, whisper_loss=0.07913, over 22118.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001693, whisper_loss=0.0915, over 3877486.80 frames. ], batch size: 89, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:45:20,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1908990.0, ans=0.125 2024-08-13 00:45:23,001 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 00:45:23,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1909090.0, ans=0.2 2024-08-13 00:45:38,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1909190.0, ans=0.125 2024-08-13 00:45:49,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1909190.0, ans=0.125 2024-08-13 00:45:52,740 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 00:46:01,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.851e+01 3.287e+01 4.773e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-13 00:46:31,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2550, loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001805, whisper_loss=0.09182, over 14563.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001675, whisper_loss=0.09122, over 3862208.33 frames. ], batch size: 60, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:46:33,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1909490.0, ans=0.125 2024-08-13 00:46:38,295 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 00:47:00,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=15.0 2024-08-13 00:47:09,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1909690.0, ans=0.2 2024-08-13 00:47:10,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1909690.0, ans=0.125 2024-08-13 00:47:33,850 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:47:35,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1909890.0, ans=0.2 2024-08-13 00:47:53,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2600, loss[loss=0.1287, beats_loss=0.008897, ecapa_loss=0.0001675, whisper_loss=0.1181, over 22199.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001676, whisper_loss=0.09156, over 3890283.46 frames. ], batch size: 88, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:47:54,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1909990.0, ans=0.125 2024-08-13 00:48:00,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1909990.0, ans=0.0 2024-08-13 00:48:06,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1909990.0, ans=0.0 2024-08-13 00:48:13,077 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 00:48:27,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1910090.0, ans=10.0 2024-08-13 00:48:32,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1910190.0, ans=0.0 2024-08-13 00:48:51,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1910290.0, ans=0.0 2024-08-13 00:48:52,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.514e+01 2.741e+01 3.048e+01 4.490e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-13 00:48:53,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1910290.0, ans=0.125 2024-08-13 00:48:58,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1910290.0, ans=0.05 2024-08-13 00:49:03,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1910390.0, ans=0.0 2024-08-13 00:49:21,802 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2650, loss[loss=0.1052, beats_loss=0.01302, ecapa_loss=0.0001414, whisper_loss=0.09079, over 21904.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.0001677, whisper_loss=0.09134, over 3877077.10 frames. ], batch size: 88, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:49:25,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1910490.0, ans=0.0 2024-08-13 00:49:32,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1910490.0, ans=0.0 2024-08-13 00:49:48,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1910590.0, ans=0.125 2024-08-13 00:49:48,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1910590.0, ans=0.125 2024-08-13 00:49:53,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1910690.0, ans=0.0 2024-08-13 00:49:53,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-13 00:50:19,534 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 00:50:29,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1910890.0, ans=0.125 2024-08-13 00:50:31,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1910890.0, ans=0.125 2024-08-13 00:50:38,316 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:50:43,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2700, loss[loss=0.1103, beats_loss=0.01019, ecapa_loss=0.0001442, whisper_loss=0.09872, over 19650.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001665, whisper_loss=0.09063, over 3872009.44 frames. ], batch size: 74, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:50:46,735 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 00:50:48,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1910990.0, ans=0.125 2024-08-13 00:50:58,260 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-13 00:51:29,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1911190.0, ans=0.125 2024-08-13 00:51:38,039 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.492e+01 2.764e+01 3.227e+01 2.218e+02, threshold=5.527e+01, percent-clipped=1.0 2024-08-13 00:51:42,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1911290.0, ans=0.125 2024-08-13 00:52:06,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2750, loss[loss=0.09787, beats_loss=0.01127, ecapa_loss=0.000142, whisper_loss=0.08518, over 15204.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01097, ecapa_loss=0.0001667, whisper_loss=0.09073, over 3883165.43 frames. ], batch size: 60, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:52:17,405 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.811e-02 2024-08-13 00:52:17,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1911490.0, ans=0.125 2024-08-13 00:52:22,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1911590.0, ans=0.1 2024-08-13 00:52:23,216 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 00:52:40,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1911690.0, ans=0.015 2024-08-13 00:53:00,902 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 00:53:03,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-13 00:53:04,743 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 00:53:16,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2024-08-13 00:53:26,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1911890.0, ans=0.0 2024-08-13 00:53:31,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2800, loss[loss=0.1168, beats_loss=0.01237, ecapa_loss=0.0001647, whisper_loss=0.1028, over 22430.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001654, whisper_loss=0.09107, over 3867617.43 frames. ], batch size: 88, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:53:31,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1911990.0, ans=0.125 2024-08-13 00:53:36,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1911990.0, ans=0.125 2024-08-13 00:54:28,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.473e+01 2.733e+01 3.017e+01 4.460e+01, threshold=5.467e+01, percent-clipped=0.0 2024-08-13 00:54:48,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.46 vs. limit=10.0 2024-08-13 00:54:57,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2850, loss[loss=0.1007, beats_loss=0.0116, ecapa_loss=0.0001518, whisper_loss=0.0876, over 18691.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001655, whisper_loss=0.09128, over 3822072.14 frames. ], batch size: 74, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:55:17,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1912590.0, ans=0.125 2024-08-13 00:55:20,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1912590.0, ans=0.0 2024-08-13 00:55:21,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1912590.0, ans=0.125 2024-08-13 00:55:41,499 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 00:55:59,632 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 00:56:03,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1912890.0, ans=0.05 2024-08-13 00:56:06,486 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 00:56:20,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2900, loss[loss=0.08632, beats_loss=0.0121, ecapa_loss=0.0001478, whisper_loss=0.07274, over 19198.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001665, whisper_loss=0.09144, over 3843244.53 frames. ], batch size: 74, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:56:26,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1912990.0, ans=0.05 2024-08-13 00:56:49,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1913090.0, ans=0.04949747468305833 2024-08-13 00:57:00,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-13 00:57:18,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.481e+01 2.818e+01 3.186e+01 4.138e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-13 00:57:32,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1913390.0, ans=0.2 2024-08-13 00:57:34,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-08-13 00:57:45,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 2950, loss[loss=0.08738, beats_loss=0.01231, ecapa_loss=0.000151, whisper_loss=0.07356, over 16629.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001687, whisper_loss=0.09133, over 3868766.63 frames. ], batch size: 66, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:57:49,128 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 00:57:58,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1913490.0, ans=0.125 2024-08-13 00:58:04,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1913590.0, ans=0.125 2024-08-13 00:58:12,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1913590.0, ans=0.0 2024-08-13 00:58:17,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1913690.0, ans=0.125 2024-08-13 00:58:27,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2024-08-13 00:58:52,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=12.0 2024-08-13 00:58:57,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1913890.0, ans=0.125 2024-08-13 00:59:02,556 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3000, loss[loss=0.1229, beats_loss=0.009744, ecapa_loss=0.0002294, whisper_loss=0.1109, over 19087.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01086, ecapa_loss=0.0001693, whisper_loss=0.09257, over 3914642.88 frames. ], batch size: 80, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:59:02,556 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 00:59:15,298 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3633, 0.6437, 2.2180, 1.2940, 1.0287, 1.7960, 2.3316, 2.2282], device='cuda:3') 2024-08-13 00:59:43,481 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005759, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 01:00:02,116 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on SV_voxceleb1: loss=0.004628, beats_loss=0, ecapa_loss=0.0004628, whisper_loss=0, over 939242.00 frames. 2024-08-13 01:01:59,828 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on AT_audioset: loss=0.02407, beats_loss=0.02407, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 01:01:59,832 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 01:02:15,327 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 01:02:30,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1914190.0, ans=0.0 2024-08-13 01:02:43,310 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 01:02:50,253 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.523e+01 2.729e+01 3.233e+01 5.051e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-13 01:02:51,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-13 01:02:56,201 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 01:03:02,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1914390.0, ans=0.1 2024-08-13 01:03:06,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.54 vs. limit=10.0 2024-08-13 01:03:16,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3050, loss[loss=0.118, beats_loss=0.0108, ecapa_loss=0.000164, whisper_loss=0.1056, over 18583.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01083, ecapa_loss=0.0001707, whisper_loss=0.09259, over 3928251.08 frames. ], batch size: 71, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:03:37,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1914590.0, ans=0.1 2024-08-13 01:03:39,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1914590.0, ans=0.125 2024-08-13 01:03:40,306 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 19 from LS+wenet, 30 from Vox, 45 fro AS 2024-08-13 01:04:14,536 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 01:04:17,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1914890.0, ans=0.1 2024-08-13 01:04:22,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-13 01:04:25,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1914890.0, ans=0.05 2024-08-13 01:04:28,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1914890.0, ans=0.125 2024-08-13 01:04:30,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3100, loss[loss=0.1025, beats_loss=0.01082, ecapa_loss=0.0001667, whisper_loss=0.09006, over 22809.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01074, ecapa_loss=0.0001715, whisper_loss=0.09318, over 3916926.42 frames. ], batch size: 90, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:04:32,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1914990.0, ans=15.0 2024-08-13 01:04:51,514 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:04:58,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1915190.0, ans=0.0 2024-08-13 01:05:18,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.430e+01 2.726e+01 3.080e+01 5.396e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 01:05:20,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1915290.0, ans=0.125 2024-08-13 01:05:27,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1915290.0, ans=0.125 2024-08-13 01:05:37,446 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 01:05:40,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1915390.0, ans=0.125 2024-08-13 01:05:44,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3150, loss[loss=0.1111, beats_loss=0.01007, ecapa_loss=0.0001808, whisper_loss=0.09923, over 23722.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01083, ecapa_loss=0.0001697, whisper_loss=0.09264, over 3916336.85 frames. ], batch size: 94, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:06:06,998 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 01:06:22,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1915690.0, ans=0.125 2024-08-13 01:06:23,517 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 01:06:24,874 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 01:06:33,950 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 01:06:53,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1915890.0, ans=0.0 2024-08-13 01:06:58,187 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3200, loss[loss=0.09691, beats_loss=0.01311, ecapa_loss=0.0001731, whisper_loss=0.08207, over 23006.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001692, whisper_loss=0.09224, over 3889695.21 frames. ], batch size: 95, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:06:59,900 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 01:07:02,663 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 01:07:05,635 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 23 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-13 01:07:08,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2024-08-13 01:07:29,408 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 01:07:36,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1916190.0, ans=0.2 2024-08-13 01:07:42,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=12.0 2024-08-13 01:07:45,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.363e+01 2.691e+01 2.946e+01 6.786e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 01:07:51,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1916290.0, ans=0.125 2024-08-13 01:07:58,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1916390.0, ans=0.125 2024-08-13 01:08:00,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1916390.0, ans=15.0 2024-08-13 01:08:01,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=1916390.0, ans=0.2 2024-08-13 01:08:10,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3250, loss[loss=0.1134, beats_loss=0.01137, ecapa_loss=0.0001139, whisper_loss=0.1008, over 19737.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01091, ecapa_loss=0.0001688, whisper_loss=0.09219, over 3870607.72 frames. ], batch size: 74, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:08:28,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1916590.0, ans=0.0 2024-08-13 01:08:31,095 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 01:08:31,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1916590.0, ans=0.125 2024-08-13 01:08:42,105 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 01:09:05,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1916790.0, ans=0.1 2024-08-13 01:09:10,927 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 01:09:25,376 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3300, loss[loss=0.126, beats_loss=0.007426, ecapa_loss=0.0001448, whisper_loss=0.1171, over 16839.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01093, ecapa_loss=0.0001692, whisper_loss=0.09232, over 3883618.61 frames. ], batch size: 61, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:09:26,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1916990.0, ans=0.0 2024-08-13 01:09:58,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1917190.0, ans=0.04949747468305833 2024-08-13 01:10:07,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1917190.0, ans=0.125 2024-08-13 01:10:13,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.401e+01 2.681e+01 3.036e+01 4.663e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 01:10:38,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3350, loss[loss=0.0892, beats_loss=0.0113, ecapa_loss=0.0001704, whisper_loss=0.07619, over 16695.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01093, ecapa_loss=0.0001695, whisper_loss=0.09252, over 3901252.52 frames. ], batch size: 69, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:10:39,224 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 01:10:42,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1917490.0, ans=0.0 2024-08-13 01:10:53,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1917590.0, ans=0.0 2024-08-13 01:10:57,117 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 01:11:29,779 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 01:11:55,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3400, loss[loss=0.09794, beats_loss=0.0113, ecapa_loss=0.0001774, whisper_loss=0.08487, over 21176.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01088, ecapa_loss=0.0001691, whisper_loss=0.0922, over 3876588.65 frames. ], batch size: 87, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:12:07,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1917990.0, ans=0.125 2024-08-13 01:12:26,309 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 01:12:28,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1918190.0, ans=0.0 2024-08-13 01:12:41,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1918290.0, ans=0.125 2024-08-13 01:12:45,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.443e+01 2.703e+01 3.105e+01 5.409e+01, threshold=5.407e+01, percent-clipped=1.0 2024-08-13 01:12:51,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1918290.0, ans=0.1 2024-08-13 01:12:51,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1918290.0, ans=0.125 2024-08-13 01:12:55,822 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 01:12:57,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.55 vs. limit=10.0 2024-08-13 01:12:58,439 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 01:13:01,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1918390.0, ans=0.125 2024-08-13 01:13:02,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-13 01:13:10,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3450, loss[loss=0.08467, beats_loss=0.01325, ecapa_loss=0.0001362, whisper_loss=0.07006, over 16209.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01085, ecapa_loss=0.0001694, whisper_loss=0.09186, over 3842237.80 frames. ], batch size: 63, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:13:18,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1918490.0, ans=0.125 2024-08-13 01:13:57,207 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 01:14:06,721 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-13 01:14:19,038 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 01:14:20,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3500, loss[loss=0.09996, beats_loss=0.01191, ecapa_loss=0.0001458, whisper_loss=0.08659, over 20649.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001698, whisper_loss=0.09202, over 3870605.00 frames. ], batch size: 84, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:14:23,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=12.0 2024-08-13 01:14:29,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1918990.0, ans=0.0 2024-08-13 01:14:34,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1919090.0, ans=0.125 2024-08-13 01:14:38,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1919090.0, ans=0.125 2024-08-13 01:14:42,528 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 01:14:43,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.88 vs. limit=10.0 2024-08-13 01:14:48,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1919190.0, ans=0.2 2024-08-13 01:14:58,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1919190.0, ans=0.1 2024-08-13 01:14:59,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1919190.0, ans=0.125 2024-08-13 01:15:00,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1919290.0, ans=0.125 2024-08-13 01:15:02,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1919290.0, ans=0.125 2024-08-13 01:15:04,666 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 01:15:05,867 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.466e+01 2.782e+01 3.112e+01 6.873e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-13 01:15:10,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1919290.0, ans=0.2 2024-08-13 01:15:11,293 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 01:15:16,950 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 01:15:18,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1919390.0, ans=0.2 2024-08-13 01:15:29,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3550, loss[loss=0.1052, beats_loss=0.01193, ecapa_loss=0.0001756, whisper_loss=0.09154, over 18922.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001689, whisper_loss=0.09198, over 3866227.75 frames. ], batch size: 73, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:15:31,324 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 01:15:58,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-13 01:15:59,585 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 01:16:07,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1919690.0, ans=0.0 2024-08-13 01:16:13,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1919790.0, ans=0.1 2024-08-13 01:16:20,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1919790.0, ans=0.2 2024-08-13 01:16:28,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1919890.0, ans=0.025 2024-08-13 01:16:36,606 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 01:16:40,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3600, loss[loss=0.1038, beats_loss=0.01115, ecapa_loss=0.0001858, whisper_loss=0.09079, over 18820.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01081, ecapa_loss=0.0001691, whisper_loss=0.09194, over 3847238.54 frames. ], batch size: 76, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:17:00,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-13 01:17:04,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1920090.0, ans=0.125 2024-08-13 01:17:09,521 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 37 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 01:17:30,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.424e+01 2.680e+01 3.106e+01 1.010e+02, threshold=5.360e+01, percent-clipped=5.0 2024-08-13 01:17:45,551 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 01:17:51,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1920390.0, ans=0.125 2024-08-13 01:17:53,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3650, loss[loss=0.1037, beats_loss=0.0112, ecapa_loss=0.000176, whisper_loss=0.09076, over 19847.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001705, whisper_loss=0.09192, over 3832337.59 frames. ], batch size: 79, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:17:56,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1920490.0, ans=0.2 2024-08-13 01:18:00,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-08-13 01:18:02,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-13 01:18:11,025 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.468e-02 2024-08-13 01:18:11,985 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 10 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 01:18:33,280 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 01:18:38,693 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 01:18:41,748 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:18:50,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1920890.0, ans=0.125 2024-08-13 01:19:03,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3700, loss[loss=0.08549, beats_loss=0.01083, ecapa_loss=0.0001807, whisper_loss=0.07286, over 13812.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01083, ecapa_loss=0.0001692, whisper_loss=0.09187, over 3829291.57 frames. ], batch size: 56, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:19:14,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=15.0 2024-08-13 01:19:17,862 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 33 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 01:19:40,203 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 01:19:44,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1921290.0, ans=0.125 2024-08-13 01:19:44,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1921290.0, ans=0.125 2024-08-13 01:19:44,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1921290.0, ans=0.1 2024-08-13 01:19:47,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-13 01:19:49,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.425e+01 2.811e+01 3.262e+01 7.758e+01, threshold=5.621e+01, percent-clipped=2.0 2024-08-13 01:20:04,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1921390.0, ans=0.0 2024-08-13 01:20:13,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3750, loss[loss=0.0776, beats_loss=0.01383, ecapa_loss=0.0002155, whisper_loss=0.06161, over 18074.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001695, whisper_loss=0.09161, over 3841348.56 frames. ], batch size: 83, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:20:14,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1921490.0, ans=0.125 2024-08-13 01:20:20,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1921490.0, ans=0.0 2024-08-13 01:20:23,931 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 01:20:28,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-13 01:20:32,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1921590.0, ans=0.125 2024-08-13 01:20:39,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1921590.0, ans=0.125 2024-08-13 01:21:01,413 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 01:21:04,229 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 01:21:12,073 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 01:21:18,839 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 01:21:22,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1921990.0, ans=0.05 2024-08-13 01:21:23,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3800, loss[loss=0.1071, beats_loss=0.009236, ecapa_loss=0.0001534, whisper_loss=0.09633, over 21733.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.0001699, whisper_loss=0.09099, over 3861312.69 frames. ], batch size: 84, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:21:37,251 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 01:21:41,655 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 01:21:46,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1922090.0, ans=0.125 2024-08-13 01:21:54,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1922190.0, ans=0.0 2024-08-13 01:22:06,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1922290.0, ans=0.125 2024-08-13 01:22:08,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.489e+01 2.785e+01 3.114e+01 6.895e+01, threshold=5.569e+01, percent-clipped=1.0 2024-08-13 01:22:29,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1922390.0, ans=0.125 2024-08-13 01:22:32,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3850, loss[loss=0.1212, beats_loss=0.008233, ecapa_loss=0.0002136, whisper_loss=0.1109, over 22089.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01102, ecapa_loss=0.0001695, whisper_loss=0.09137, over 3870288.06 frames. ], batch size: 90, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:22:41,155 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 01:22:55,697 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-13 01:23:07,254 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.705e-02 2024-08-13 01:23:07,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1922690.0, ans=0.125 2024-08-13 01:23:15,701 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 01:23:32,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1922890.0, ans=0.1 2024-08-13 01:23:42,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3900, loss[loss=0.07463, beats_loss=0.009252, ecapa_loss=0.0001997, whisper_loss=0.06338, over 13287.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01098, ecapa_loss=0.0001704, whisper_loss=0.0918, over 3879460.94 frames. ], batch size: 55, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:23:51,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1922990.0, ans=0.1 2024-08-13 01:23:53,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1922990.0, ans=0.0 2024-08-13 01:23:54,182 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 01:24:11,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1923190.0, ans=0.0 2024-08-13 01:24:11,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-13 01:24:25,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1923290.0, ans=0.0 2024-08-13 01:24:27,565 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 01:24:28,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.595e+01 2.867e+01 3.243e+01 6.009e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-13 01:24:43,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1923390.0, ans=0.125 2024-08-13 01:24:47,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-13 01:24:49,428 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 01:24:52,331 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 3950, loss[loss=0.08798, beats_loss=0.01259, ecapa_loss=0.0001556, whisper_loss=0.07384, over 20096.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001708, whisper_loss=0.09264, over 3931892.94 frames. ], batch size: 77, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:25:01,263 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 01:25:12,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1923590.0, ans=0.125 2024-08-13 01:25:12,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1923590.0, ans=0.1 2024-08-13 01:25:21,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2024-08-13 01:25:25,028 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 01:25:34,509 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 01:25:40,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1923790.0, ans=0.0 2024-08-13 01:25:47,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1923890.0, ans=0.125 2024-08-13 01:25:58,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1923890.0, ans=10.0 2024-08-13 01:25:59,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1923890.0, ans=0.1 2024-08-13 01:26:02,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4000, loss[loss=0.09372, beats_loss=0.01113, ecapa_loss=0.0001844, whisper_loss=0.08075, over 17682.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01093, ecapa_loss=0.0001711, whisper_loss=0.09221, over 3918117.53 frames. ], batch size: 71, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:26:04,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-13 01:26:18,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-13 01:26:34,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1924190.0, ans=0.125 2024-08-13 01:26:37,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1924190.0, ans=0.0 2024-08-13 01:26:47,230 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:26:47,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1924290.0, ans=0.0 2024-08-13 01:26:47,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.537e+01 2.883e+01 3.271e+01 5.034e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-13 01:27:09,475 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 01:27:12,041 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4050, loss[loss=0.107, beats_loss=0.01072, ecapa_loss=0.0001352, whisper_loss=0.09489, over 17546.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01084, ecapa_loss=0.0001706, whisper_loss=0.09279, over 3886928.33 frames. ], batch size: 67, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:27:16,695 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 01:27:23,521 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 01:27:26,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1924590.0, ans=0.0 2024-08-13 01:27:37,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1924590.0, ans=0.125 2024-08-13 01:28:04,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-13 01:28:06,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1924890.0, ans=0.2 2024-08-13 01:28:07,863 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 01:28:16,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-13 01:28:18,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2024-08-13 01:28:21,337 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4100, loss[loss=0.106, beats_loss=0.01162, ecapa_loss=0.0001386, whisper_loss=0.09295, over 22428.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001712, whisper_loss=0.09206, over 3877865.72 frames. ], batch size: 89, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:28:26,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1924990.0, ans=0.125 2024-08-13 01:28:33,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1924990.0, ans=0.1 2024-08-13 01:28:33,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1924990.0, ans=0.95 2024-08-13 01:28:36,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1925090.0, ans=0.1 2024-08-13 01:28:41,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1925090.0, ans=0.2 2024-08-13 01:28:47,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1925090.0, ans=0.1 2024-08-13 01:28:47,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1925090.0, ans=0.2 2024-08-13 01:28:53,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=1925190.0, ans=0.02 2024-08-13 01:29:01,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1925190.0, ans=0.1 2024-08-13 01:29:02,657 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 01:29:08,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.339e+01 2.647e+01 3.027e+01 3.702e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-13 01:29:11,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1925290.0, ans=0.1 2024-08-13 01:29:13,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1925290.0, ans=10.0 2024-08-13 01:29:27,093 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 01:29:32,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4150, loss[loss=0.1211, beats_loss=0.008909, ecapa_loss=0.0001446, whisper_loss=0.1108, over 16483.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001714, whisper_loss=0.09154, over 3839281.34 frames. ], batch size: 61, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:29:34,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1925490.0, ans=0.125 2024-08-13 01:30:04,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1925690.0, ans=0.0 2024-08-13 01:30:05,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1925690.0, ans=0.2 2024-08-13 01:30:09,564 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 01:30:15,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1925790.0, ans=0.125 2024-08-13 01:30:15,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1925790.0, ans=0.1 2024-08-13 01:30:43,176 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4200, loss[loss=0.08604, beats_loss=0.01282, ecapa_loss=0.000135, whisper_loss=0.07187, over 21338.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01103, ecapa_loss=0.0001702, whisper_loss=0.09124, over 3879484.06 frames. ], batch size: 83, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:30:47,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1925990.0, ans=0.0 2024-08-13 01:30:51,625 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 01:30:52,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1925990.0, ans=0.125 2024-08-13 01:30:53,253 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-13 01:30:57,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.06 vs. limit=6.0 2024-08-13 01:30:59,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1926090.0, ans=0.0 2024-08-13 01:31:02,913 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 01:31:19,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1926190.0, ans=0.1 2024-08-13 01:31:22,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1926190.0, ans=0.5 2024-08-13 01:31:28,522 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.387e+01 2.732e+01 2.995e+01 7.981e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-13 01:31:39,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1926390.0, ans=0.125 2024-08-13 01:31:49,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1926390.0, ans=0.125 2024-08-13 01:31:52,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4250, loss[loss=0.1108, beats_loss=0.01128, ecapa_loss=0.0001567, whisper_loss=0.09799, over 22455.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01112, ecapa_loss=0.0001689, whisper_loss=0.09048, over 3870750.33 frames. ], batch size: 88, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:31:54,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1926490.0, ans=0.125 2024-08-13 01:32:01,146 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 01:32:04,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-13 01:32:05,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1926590.0, ans=0.125 2024-08-13 01:32:11,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1926590.0, ans=0.125 2024-08-13 01:32:15,200 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 01:32:16,720 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 01:32:19,385 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 01:32:31,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1926690.0, ans=0.1 2024-08-13 01:32:31,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-13 01:32:40,369 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-13 01:32:52,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-13 01:32:59,748 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 01:33:02,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4300, loss[loss=0.09407, beats_loss=0.01109, ecapa_loss=0.0001714, whisper_loss=0.08127, over 19483.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01108, ecapa_loss=0.0001688, whisper_loss=0.09069, over 3878814.67 frames. ], batch size: 81, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:33:09,375 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 01:33:23,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2024-08-13 01:33:44,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-13 01:33:48,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.403e+01 2.611e+01 3.081e+01 4.718e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-13 01:33:48,467 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 01:34:07,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2024-08-13 01:34:11,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4350, loss[loss=0.1059, beats_loss=0.01093, ecapa_loss=0.0001367, whisper_loss=0.09364, over 17350.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01095, ecapa_loss=0.0001688, whisper_loss=0.09041, over 3855821.29 frames. ], batch size: 64, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:34:29,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1927590.0, ans=0.0 2024-08-13 01:34:35,618 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 01:34:43,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1927690.0, ans=0.125 2024-08-13 01:34:46,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0 2024-08-13 01:34:58,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-13 01:35:03,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1927790.0, ans=0.0 2024-08-13 01:35:07,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1927890.0, ans=0.0 2024-08-13 01:35:13,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1927890.0, ans=0.125 2024-08-13 01:35:21,056 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4400, loss[loss=0.06704, beats_loss=0.01498, ecapa_loss=0.0001705, whisper_loss=0.05036, over 13987.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001698, whisper_loss=0.09134, over 3857797.58 frames. ], batch size: 59, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:35:36,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1928090.0, ans=0.125 2024-08-13 01:35:44,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1928090.0, ans=0.125 2024-08-13 01:36:06,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.420e+01 2.637e+01 3.058e+01 4.603e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 01:36:17,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1928390.0, ans=0.125 2024-08-13 01:36:21,511 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-13 01:36:30,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4450, loss[loss=0.1178, beats_loss=0.01108, ecapa_loss=0.000203, whisper_loss=0.1047, over 15709.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.0001711, whisper_loss=0.09183, over 3870969.43 frames. ], batch size: 64, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:36:32,084 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-13 01:36:36,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.55 vs. limit=10.0 2024-08-13 01:36:44,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-13 01:36:59,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1928690.0, ans=0.0 2024-08-13 01:37:07,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1928690.0, ans=0.0 2024-08-13 01:37:18,267 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 01:37:21,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-08-13 01:37:22,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2024-08-13 01:37:34,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1928890.0, ans=0.95 2024-08-13 01:37:35,981 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 01:37:39,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4500, loss[loss=0.1204, beats_loss=0.009913, ecapa_loss=0.000188, whisper_loss=0.1086, over 17060.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01073, ecapa_loss=0.0001711, whisper_loss=0.09241, over 3882212.72 frames. ], batch size: 67, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:37:49,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-13 01:38:09,685 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 01:38:12,429 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 01:38:17,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1929190.0, ans=0.125 2024-08-13 01:38:25,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1929290.0, ans=0.2 2024-08-13 01:38:26,408 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 01:38:26,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1929290.0, ans=0.125 2024-08-13 01:38:27,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.366e+01 2.717e+01 3.132e+01 4.916e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-13 01:38:33,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1929290.0, ans=15.0 2024-08-13 01:38:49,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4550, loss[loss=0.1028, beats_loss=0.01033, ecapa_loss=0.0001796, whisper_loss=0.0907, over 17232.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001711, whisper_loss=0.09162, over 3903343.58 frames. ], batch size: 67, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:39:31,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1929790.0, ans=0.2 2024-08-13 01:39:37,598 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-13 01:39:42,001 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 01:39:42,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1929790.0, ans=0.125 2024-08-13 01:39:44,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1929890.0, ans=0.0 2024-08-13 01:39:50,001 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 01:39:59,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4600, loss[loss=0.1057, beats_loss=0.009598, ecapa_loss=0.0001694, whisper_loss=0.09439, over 18222.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001706, whisper_loss=0.09099, over 3873730.81 frames. ], batch size: 71, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:40:01,232 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-13 01:40:04,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1929990.0, ans=0.0 2024-08-13 01:40:06,164 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:40:24,959 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 01:40:27,785 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 01:40:34,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1930190.0, ans=0.125 2024-08-13 01:40:39,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1930290.0, ans=0.125 2024-08-13 01:40:42,184 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 01:40:46,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.515e+01 2.755e+01 3.045e+01 4.770e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-13 01:40:49,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1930290.0, ans=0.1 2024-08-13 01:40:53,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1930390.0, ans=0.125 2024-08-13 01:40:56,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-13 01:41:03,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1930390.0, ans=0.2 2024-08-13 01:41:05,302 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 01:41:07,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4650, loss[loss=0.1155, beats_loss=0.009849, ecapa_loss=0.0001913, whisper_loss=0.1038, over 23041.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.000171, whisper_loss=0.09051, over 3877306.60 frames. ], batch size: 94, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:41:23,435 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 01:41:23,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1930590.0, ans=0.07 2024-08-13 01:41:25,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1930590.0, ans=0.125 2024-08-13 01:41:44,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1930690.0, ans=0.1 2024-08-13 01:41:53,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1930790.0, ans=0.1 2024-08-13 01:41:56,578 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 01:42:03,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1930890.0, ans=0.2 2024-08-13 01:42:04,527 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 01:42:07,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1930890.0, ans=0.0 2024-08-13 01:42:09,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-13 01:42:10,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1930890.0, ans=0.125 2024-08-13 01:42:14,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1930890.0, ans=0.125 2024-08-13 01:42:16,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4700, loss[loss=0.1044, beats_loss=0.01292, ecapa_loss=0.0001375, whisper_loss=0.0901, over 22947.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001699, whisper_loss=0.09072, over 3886957.88 frames. ], batch size: 91, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:42:26,954 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 01:42:42,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1931090.0, ans=0.125 2024-08-13 01:42:56,394 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 01:43:03,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.542e+01 2.823e+01 3.098e+01 3.628e+02, threshold=5.646e+01, percent-clipped=2.0 2024-08-13 01:43:08,957 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 01:43:22,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1931390.0, ans=0.0 2024-08-13 01:43:22,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1931390.0, ans=0.1 2024-08-13 01:43:25,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1931490.0, ans=0.125 2024-08-13 01:43:26,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4750, loss[loss=0.09506, beats_loss=0.009521, ecapa_loss=0.0001393, whisper_loss=0.08414, over 16811.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001706, whisper_loss=0.09089, over 3897230.88 frames. ], batch size: 65, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:43:30,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1931490.0, ans=0.1 2024-08-13 01:43:37,141 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 01:43:44,171 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 01:43:49,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1931590.0, ans=0.02 2024-08-13 01:43:55,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1931690.0, ans=0.07 2024-08-13 01:44:14,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1931790.0, ans=0.04949747468305833 2024-08-13 01:44:16,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1931790.0, ans=0.125 2024-08-13 01:44:17,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1931790.0, ans=0.125 2024-08-13 01:44:21,836 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 01:44:23,270 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 01:44:29,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1931890.0, ans=0.1 2024-08-13 01:44:31,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1931890.0, ans=0.2 2024-08-13 01:44:33,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1931890.0, ans=0.125 2024-08-13 01:44:42,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4800, loss[loss=0.1129, beats_loss=0.007505, ecapa_loss=0.0001534, whisper_loss=0.1038, over 14917.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01084, ecapa_loss=0.000171, whisper_loss=0.09151, over 3881027.06 frames. ], batch size: 56, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:44:49,680 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-13 01:45:02,119 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 01:45:10,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1932090.0, ans=0.125 2024-08-13 01:45:21,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1932190.0, ans=0.0 2024-08-13 01:45:25,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1932190.0, ans=0.125 2024-08-13 01:45:36,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2024-08-13 01:45:49,577 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.506e+01 2.786e+01 3.078e+01 4.876e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-13 01:46:00,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1932390.0, ans=0.0 2024-08-13 01:46:04,982 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 01:46:21,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4850, loss[loss=0.1132, beats_loss=0.009378, ecapa_loss=0.0001652, whisper_loss=0.1022, over 15350.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001715, whisper_loss=0.09155, over 3867406.44 frames. ], batch size: 58, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:46:52,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1932590.0, ans=0.0 2024-08-13 01:46:56,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1932590.0, ans=0.0 2024-08-13 01:47:03,787 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-13 01:47:16,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1932690.0, ans=0.2 2024-08-13 01:47:32,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-13 01:47:37,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1932790.0, ans=0.0 2024-08-13 01:47:57,900 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 01:48:04,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1932890.0, ans=0.125 2024-08-13 01:48:10,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1932990.0, ans=0.125 2024-08-13 01:48:11,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4900, loss[loss=0.09364, beats_loss=0.01283, ecapa_loss=0.0001569, whisper_loss=0.07923, over 22497.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001731, whisper_loss=0.0911, over 3894428.41 frames. ], batch size: 89, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:48:13,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1932990.0, ans=0.0 2024-08-13 01:48:25,939 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 23 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-13 01:49:31,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1933290.0, ans=0.2 2024-08-13 01:49:32,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.462e+01 2.765e+01 3.056e+01 4.985e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-13 01:49:36,747 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 01:49:49,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1933390.0, ans=0.0 2024-08-13 01:50:03,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-13 01:50:03,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 4950, loss[loss=0.1111, beats_loss=0.01106, ecapa_loss=0.0001683, whisper_loss=0.09837, over 17183.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001731, whisper_loss=0.09096, over 3875379.01 frames. ], batch size: 69, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:50:04,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1933490.0, ans=0.125 2024-08-13 01:50:16,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1933490.0, ans=0.125 2024-08-13 01:50:22,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1933590.0, ans=0.125 2024-08-13 01:50:26,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1933590.0, ans=0.125 2024-08-13 01:50:48,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1933790.0, ans=0.0 2024-08-13 01:51:01,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-13 01:51:09,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1933890.0, ans=0.1 2024-08-13 01:51:13,704 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 01:51:20,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5000, loss[loss=0.109, beats_loss=0.008208, ecapa_loss=0.0002261, whisper_loss=0.09853, over 20734.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001723, whisper_loss=0.09089, over 3889787.33 frames. ], batch size: 86, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:51:21,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1933990.0, ans=0.125 2024-08-13 01:51:22,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1933990.0, ans=0.125 2024-08-13 01:51:24,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1933990.0, ans=0.125 2024-08-13 01:51:26,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1933990.0, ans=0.125 2024-08-13 01:51:34,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1933990.0, ans=0.0 2024-08-13 01:51:34,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-08-13 01:51:35,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1934090.0, ans=0.125 2024-08-13 01:51:38,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-13 01:51:42,898 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 01:51:43,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1934090.0, ans=0.07 2024-08-13 01:51:44,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1934090.0, ans=0.0 2024-08-13 01:51:52,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1934190.0, ans=0.125 2024-08-13 01:52:00,830 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:52:03,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1934190.0, ans=0.015 2024-08-13 01:52:13,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.373e+01 2.737e+01 3.184e+01 6.268e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 01:52:25,001 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 01:52:28,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1934390.0, ans=0.0 2024-08-13 01:52:36,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1934390.0, ans=0.0 2024-08-13 01:52:39,242 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5050, loss[loss=0.09871, beats_loss=0.01167, ecapa_loss=0.0001455, whisper_loss=0.08559, over 22258.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001718, whisper_loss=0.09113, over 3888308.07 frames. ], batch size: 89, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:52:55,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1934590.0, ans=0.125 2024-08-13 01:53:37,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1934790.0, ans=0.2 2024-08-13 01:53:38,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1934790.0, ans=0.1 2024-08-13 01:53:41,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1934790.0, ans=0.125 2024-08-13 01:53:45,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1934890.0, ans=0.0 2024-08-13 01:53:53,713 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 01:54:00,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5100, loss[loss=0.1013, beats_loss=0.01182, ecapa_loss=0.0001537, whisper_loss=0.0879, over 21421.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001701, whisper_loss=0.09161, over 3897972.64 frames. ], batch size: 86, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:54:02,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1934990.0, ans=0.125 2024-08-13 01:54:16,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1935090.0, ans=0.125 2024-08-13 01:54:26,377 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 01:54:56,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.475e+01 2.679e+01 3.018e+01 4.914e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-13 01:54:57,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1935290.0, ans=0.0 2024-08-13 01:54:57,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1935290.0, ans=0.5 2024-08-13 01:55:07,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1935390.0, ans=0.1 2024-08-13 01:55:22,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5150, loss[loss=0.09809, beats_loss=0.01309, ecapa_loss=0.0001393, whisper_loss=0.08361, over 22355.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001703, whisper_loss=0.09158, over 3874081.99 frames. ], batch size: 92, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:55:24,711 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 01:55:28,431 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 01:55:30,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1935490.0, ans=0.125 2024-08-13 01:55:33,731 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 01:55:35,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1935490.0, ans=0.0 2024-08-13 01:55:37,004 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 01:55:41,929 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 01:56:06,360 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 01:56:16,200 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 01:56:19,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1935790.0, ans=0.125 2024-08-13 01:56:47,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5200, loss[loss=0.1034, beats_loss=0.01236, ecapa_loss=0.0001583, whisper_loss=0.0895, over 19368.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001695, whisper_loss=0.0919, over 3837450.44 frames. ], batch size: 78, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:56:51,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1935990.0, ans=0.1 2024-08-13 01:56:56,079 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 01:57:14,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.74 vs. limit=15.0 2024-08-13 01:57:20,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-13 01:57:35,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1936290.0, ans=0.125 2024-08-13 01:57:41,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2024-08-13 01:57:42,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.433e+01 2.676e+01 3.023e+01 1.012e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-13 01:57:46,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1936290.0, ans=0.125 2024-08-13 01:57:51,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1936390.0, ans=0.125 2024-08-13 01:57:54,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.40 vs. limit=10.0 2024-08-13 01:58:08,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5250, loss[loss=0.1028, beats_loss=0.01095, ecapa_loss=0.0001292, whisper_loss=0.09054, over 15761.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01102, ecapa_loss=0.0001688, whisper_loss=0.09094, over 3822379.23 frames. ], batch size: 59, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:58:22,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1936490.0, ans=0.125 2024-08-13 01:58:36,525 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 01:58:53,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=12.0 2024-08-13 01:59:01,111 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 01:59:30,439 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5300, loss[loss=0.09416, beats_loss=0.01126, ecapa_loss=0.0001633, whisper_loss=0.08127, over 18488.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.0001698, whisper_loss=0.09186, over 3858445.96 frames. ], batch size: 73, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:59:38,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=15.0 2024-08-13 01:59:40,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1936990.0, ans=0.0 2024-08-13 01:59:52,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1937090.0, ans=0.2 2024-08-13 02:00:01,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1937090.0, ans=0.125 2024-08-13 02:00:21,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1937290.0, ans=0.125 2024-08-13 02:00:23,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1937290.0, ans=0.2 2024-08-13 02:00:25,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.483e+01 2.816e+01 3.213e+01 1.142e+02, threshold=5.632e+01, percent-clipped=3.0 2024-08-13 02:00:48,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1937390.0, ans=0.125 2024-08-13 02:00:51,051 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5350, loss[loss=0.09693, beats_loss=0.01154, ecapa_loss=0.0001353, whisper_loss=0.08404, over 14467.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001674, whisper_loss=0.09172, over 3901586.31 frames. ], batch size: 57, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:00:53,194 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 02:00:58,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1937490.0, ans=0.05 2024-08-13 02:01:14,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1937590.0, ans=0.125 2024-08-13 02:01:21,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1937590.0, ans=10.0 2024-08-13 02:01:22,063 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 02:01:28,384 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 02:01:28,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1937690.0, ans=0.1 2024-08-13 02:01:41,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1937790.0, ans=0.1 2024-08-13 02:01:41,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1937790.0, ans=0.0 2024-08-13 02:01:46,308 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 02:01:51,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1937790.0, ans=0.125 2024-08-13 02:01:55,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1937890.0, ans=0.125 2024-08-13 02:02:04,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1937890.0, ans=0.1 2024-08-13 02:02:12,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1937990.0, ans=0.125 2024-08-13 02:02:13,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-08-13 02:02:13,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5400, loss[loss=0.102, beats_loss=0.01125, ecapa_loss=0.0001545, whisper_loss=0.08922, over 18526.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001676, whisper_loss=0.09148, over 3894539.91 frames. ], batch size: 73, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:02:21,086 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 18 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 02:02:27,649 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 02:02:31,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1938090.0, ans=0.0 2024-08-13 02:03:09,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.492e+01 2.751e+01 3.252e+01 5.304e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-13 02:03:22,298 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-13 02:03:37,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5450, loss[loss=0.1002, beats_loss=0.01114, ecapa_loss=0.0001477, whisper_loss=0.08761, over 18264.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.000168, whisper_loss=0.09157, over 3897573.55 frames. ], batch size: 72, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:03:56,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1938590.0, ans=0.0 2024-08-13 02:04:01,197 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 02:04:04,503 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 02:04:11,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1938690.0, ans=0.2 2024-08-13 02:04:16,722 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.894e-02 2024-08-13 02:04:18,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1938690.0, ans=0.2 2024-08-13 02:04:25,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1938790.0, ans=0.125 2024-08-13 02:04:59,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5500, loss[loss=0.08856, beats_loss=0.01032, ecapa_loss=0.0001477, whisper_loss=0.07676, over 14160.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001674, whisper_loss=0.09164, over 3923521.90 frames. ], batch size: 54, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:05:04,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1938990.0, ans=0.125 2024-08-13 02:05:14,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1939090.0, ans=0.125 2024-08-13 02:05:23,003 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 02:05:39,041 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 02:05:41,657 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 02:05:46,743 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:05:52,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.471e+01 2.738e+01 3.080e+01 7.605e+01, threshold=5.476e+01, percent-clipped=2.0 2024-08-13 02:06:07,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1939390.0, ans=0.1 2024-08-13 02:06:11,815 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 02:06:18,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=1939490.0, ans=15.0 2024-08-13 02:06:18,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5550, loss[loss=0.09886, beats_loss=0.01151, ecapa_loss=0.000201, whisper_loss=0.08534, over 22250.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01094, ecapa_loss=0.0001686, whisper_loss=0.09108, over 3916359.26 frames. ], batch size: 91, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:06:24,786 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 02:06:31,564 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:06:33,527 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 02:06:59,453 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.066e+00 2024-08-13 02:07:02,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1939690.0, ans=0.125 2024-08-13 02:07:07,276 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 02:07:37,501 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 02:07:38,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5600, loss[loss=0.09935, beats_loss=0.01225, ecapa_loss=0.0001919, whisper_loss=0.08518, over 21353.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001701, whisper_loss=0.09132, over 3936423.31 frames. ], batch size: 89, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:07:40,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1939990.0, ans=0.2 2024-08-13 02:07:49,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1939990.0, ans=0.125 2024-08-13 02:07:58,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1940090.0, ans=0.1 2024-08-13 02:08:31,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1940290.0, ans=0.2 2024-08-13 02:08:35,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.482e+01 2.705e+01 3.003e+01 6.205e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 02:08:36,086 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 02:08:49,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1940390.0, ans=0.1 2024-08-13 02:09:01,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5650, loss[loss=0.09736, beats_loss=0.01146, ecapa_loss=0.0001453, whisper_loss=0.08445, over 18865.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001696, whisper_loss=0.09184, over 3930017.82 frames. ], batch size: 76, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:09:04,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1940490.0, ans=0.1 2024-08-13 02:09:29,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1940590.0, ans=0.125 2024-08-13 02:09:41,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1940690.0, ans=0.1 2024-08-13 02:09:42,992 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 41 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 02:09:45,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1940690.0, ans=0.2 2024-08-13 02:10:09,790 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 02:10:18,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1940890.0, ans=0.125 2024-08-13 02:10:22,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5700, loss[loss=0.1065, beats_loss=0.007291, ecapa_loss=0.0002096, whisper_loss=0.09708, over 18507.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01094, ecapa_loss=0.0001687, whisper_loss=0.09193, over 3952989.78 frames. ], batch size: 75, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:10:25,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.13 vs. limit=10.0 2024-08-13 02:10:34,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1940990.0, ans=0.125 2024-08-13 02:10:36,989 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 02:10:43,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-08-13 02:10:56,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1941190.0, ans=0.125 2024-08-13 02:11:04,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1941190.0, ans=0.0 2024-08-13 02:11:10,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1941290.0, ans=0.125 2024-08-13 02:11:16,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.518e+01 2.759e+01 3.173e+01 1.965e+02, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 02:11:41,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5750, loss[loss=0.07441, beats_loss=0.01314, ecapa_loss=0.0001345, whisper_loss=0.05993, over 13804.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01094, ecapa_loss=0.0001694, whisper_loss=0.09183, over 3904320.06 frames. ], batch size: 54, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:11:43,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1941490.0, ans=0.1 2024-08-13 02:11:46,771 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-13 02:11:54,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-08-13 02:12:03,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1941590.0, ans=0.0 2024-08-13 02:12:13,634 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 13 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 02:12:25,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1941690.0, ans=0.0 2024-08-13 02:12:43,744 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 02:13:02,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5800, loss[loss=0.09629, beats_loss=0.008318, ecapa_loss=0.0001624, whisper_loss=0.08635, over 17776.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001697, whisper_loss=0.0911, over 3879012.83 frames. ], batch size: 66, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:13:11,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1941990.0, ans=0.125 2024-08-13 02:13:19,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.79 vs. limit=10.0 2024-08-13 02:13:40,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.91 vs. limit=22.5 2024-08-13 02:13:43,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1942190.0, ans=0.05 2024-08-13 02:13:43,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2024-08-13 02:13:57,665 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.443e+01 2.748e+01 3.161e+01 4.611e+01, threshold=5.495e+01, percent-clipped=0.0 2024-08-13 02:14:24,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5850, loss[loss=0.09748, beats_loss=0.01208, ecapa_loss=0.0001563, whisper_loss=0.08383, over 14088.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001698, whisper_loss=0.0914, over 3876211.90 frames. ], batch size: 55, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:14:32,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1942490.0, ans=0.125 2024-08-13 02:14:40,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1942590.0, ans=0.0 2024-08-13 02:14:49,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1942590.0, ans=0.125 2024-08-13 02:14:58,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1942690.0, ans=0.0 2024-08-13 02:15:00,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1942690.0, ans=0.2 2024-08-13 02:15:39,812 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 02:15:47,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5900, loss[loss=0.1056, beats_loss=0.009318, ecapa_loss=0.0001871, whisper_loss=0.0944, over 20860.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001694, whisper_loss=0.09139, over 3863682.91 frames. ], batch size: 86, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:15:49,573 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 02:16:23,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1943190.0, ans=0.125 2024-08-13 02:16:40,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.528e+01 2.790e+01 3.084e+01 1.766e+02, threshold=5.581e+01, percent-clipped=1.0 2024-08-13 02:16:43,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1943290.0, ans=0.0 2024-08-13 02:17:01,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1943390.0, ans=0.1 2024-08-13 02:17:04,795 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 02:17:07,180 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 5950, loss[loss=0.0993, beats_loss=0.01019, ecapa_loss=0.000179, whisper_loss=0.08731, over 20676.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001691, whisper_loss=0.09075, over 3836916.25 frames. ], batch size: 82, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:17:14,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1943490.0, ans=0.125 2024-08-13 02:17:17,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1943490.0, ans=0.125 2024-08-13 02:17:22,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1943590.0, ans=0.125 2024-08-13 02:17:51,480 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 02:17:54,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2024-08-13 02:17:58,721 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 02:18:03,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1943790.0, ans=0.125 2024-08-13 02:18:03,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1943790.0, ans=0.125 2024-08-13 02:18:05,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-13 02:18:16,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1943890.0, ans=0.07 2024-08-13 02:18:22,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2024-08-13 02:18:27,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1943990.0, ans=0.125 2024-08-13 02:18:28,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6000, loss[loss=0.09579, beats_loss=0.01199, ecapa_loss=0.0001643, whisper_loss=0.08216, over 17262.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01088, ecapa_loss=0.0001695, whisper_loss=0.09115, over 3864624.72 frames. ], batch size: 66, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:18:28,327 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 02:19:07,068 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005835, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 02:19:25,597 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on SV_voxceleb1: loss=0.004586, beats_loss=0, ecapa_loss=0.0004586, whisper_loss=0, over 939242.00 frames. 2024-08-13 02:21:14,566 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on AT_audioset: loss=0.02397, beats_loss=0.02397, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 02:21:14,570 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 02:21:33,544 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.009e+01 2024-08-13 02:22:04,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1944290.0, ans=0.125 2024-08-13 02:22:08,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1944290.0, ans=0.0 2024-08-13 02:22:10,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.472e+01 2.800e+01 3.130e+01 4.518e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-13 02:22:14,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1944290.0, ans=0.0 2024-08-13 02:22:34,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2024-08-13 02:22:35,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1944490.0, ans=0.125 2024-08-13 02:22:35,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6050, loss[loss=0.1072, beats_loss=0.01047, ecapa_loss=0.0001759, whisper_loss=0.09498, over 22881.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001693, whisper_loss=0.0913, over 3841053.77 frames. ], batch size: 93, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:22:48,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=12.0 2024-08-13 02:22:52,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-13 02:23:03,966 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 10 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 02:23:04,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1944590.0, ans=0.125 2024-08-13 02:23:29,091 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 02:23:39,317 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 02:23:46,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1944890.0, ans=0.1 2024-08-13 02:23:54,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1944890.0, ans=0.125 2024-08-13 02:23:58,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6100, loss[loss=0.0969, beats_loss=0.0103, ecapa_loss=0.0002165, whisper_loss=0.08443, over 14673.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01092, ecapa_loss=0.0001693, whisper_loss=0.09067, over 3847952.19 frames. ], batch size: 60, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:23:59,758 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 02:24:06,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1944990.0, ans=0.125 2024-08-13 02:24:19,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.72 vs. limit=22.5 2024-08-13 02:24:33,711 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 02:24:47,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1945290.0, ans=0.0 2024-08-13 02:24:48,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1945290.0, ans=0.1 2024-08-13 02:24:53,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.564e+01 2.945e+01 3.314e+01 6.954e+01, threshold=5.890e+01, percent-clipped=1.0 2024-08-13 02:25:03,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1945390.0, ans=0.125 2024-08-13 02:25:11,359 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 02:25:16,353 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 37 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 02:25:21,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6150, loss[loss=0.09918, beats_loss=0.01111, ecapa_loss=0.0001692, whisper_loss=0.08638, over 20802.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01097, ecapa_loss=0.00017, whisper_loss=0.09024, over 3880103.56 frames. ], batch size: 86, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:25:34,824 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 02:26:05,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1945690.0, ans=0.1 2024-08-13 02:26:42,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6200, loss[loss=0.09847, beats_loss=0.009524, ecapa_loss=0.0001448, whisper_loss=0.0875, over 15619.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01091, ecapa_loss=0.0001684, whisper_loss=0.09114, over 3884875.89 frames. ], batch size: 59, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:27:01,725 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 02:27:03,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1946090.0, ans=0.1 2024-08-13 02:27:10,314 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 02:27:10,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1946090.0, ans=0.125 2024-08-13 02:27:22,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1946190.0, ans=0.2 2024-08-13 02:27:31,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-08-13 02:27:39,577 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.499e+01 2.801e+01 3.134e+01 4.474e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-13 02:27:40,127 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 02:27:42,887 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 02:27:50,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1946390.0, ans=6.0 2024-08-13 02:27:52,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1946390.0, ans=0.125 2024-08-13 02:28:05,154 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6250, loss[loss=0.1177, beats_loss=0.008901, ecapa_loss=0.0001572, whisper_loss=0.1072, over 16808.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001676, whisper_loss=0.09133, over 3861175.52 frames. ], batch size: 65, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:28:05,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1946490.0, ans=0.125 2024-08-13 02:28:05,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1946490.0, ans=0.125 2024-08-13 02:28:08,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1946490.0, ans=0.125 2024-08-13 02:28:23,810 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 02:28:33,682 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 02:28:34,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1946590.0, ans=0.125 2024-08-13 02:28:38,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1946690.0, ans=0.0 2024-08-13 02:28:46,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1946690.0, ans=0.0 2024-08-13 02:28:47,726 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 02:29:09,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1946890.0, ans=0.0 2024-08-13 02:29:11,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1946890.0, ans=0.5 2024-08-13 02:29:24,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1946890.0, ans=0.1 2024-08-13 02:29:26,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6300, loss[loss=0.114, beats_loss=0.01184, ecapa_loss=0.0001408, whisper_loss=0.1007, over 20010.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001669, whisper_loss=0.09171, over 3891205.74 frames. ], batch size: 78, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:29:45,967 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 02:29:58,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2024-08-13 02:30:00,788 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 02:30:15,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2024-08-13 02:30:20,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.428e+01 2.719e+01 3.075e+01 5.745e+01, threshold=5.438e+01, percent-clipped=1.0 2024-08-13 02:30:45,829 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6350, loss[loss=0.1261, beats_loss=0.009483, ecapa_loss=0.0001763, whisper_loss=0.1149, over 22551.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001672, whisper_loss=0.09125, over 3883435.69 frames. ], batch size: 86, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:31:10,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1947590.0, ans=0.125 2024-08-13 02:31:17,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-13 02:32:01,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1947890.0, ans=0.1 2024-08-13 02:32:07,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6400, loss[loss=0.09969, beats_loss=0.01265, ecapa_loss=0.0001549, whisper_loss=0.08548, over 23073.00 frames. ], tot_loss[loss=0.104, beats_loss=0.011, ecapa_loss=0.0001661, whisper_loss=0.09131, over 3922449.86 frames. ], batch size: 94, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:32:12,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1947990.0, ans=0.0 2024-08-13 02:32:48,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-13 02:32:56,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1948290.0, ans=0.0 2024-08-13 02:33:01,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-13 02:33:02,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1948290.0, ans=0.0 2024-08-13 02:33:04,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.410e+01 2.725e+01 3.146e+01 5.039e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 02:33:12,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1948290.0, ans=0.125 2024-08-13 02:33:16,486 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 02:33:19,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1948390.0, ans=0.07 2024-08-13 02:33:31,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6450, loss[loss=0.1177, beats_loss=0.01031, ecapa_loss=0.0001522, whisper_loss=0.1058, over 22381.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001683, whisper_loss=0.09184, over 3928695.81 frames. ], batch size: 86, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:33:36,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1948490.0, ans=15.0 2024-08-13 02:33:44,323 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 02:33:45,946 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 02:33:47,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1948590.0, ans=0.0 2024-08-13 02:34:03,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1948590.0, ans=0.125 2024-08-13 02:34:07,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1948690.0, ans=0.2 2024-08-13 02:34:18,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2024-08-13 02:34:25,433 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 02:34:26,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1948790.0, ans=0.1 2024-08-13 02:34:26,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1948790.0, ans=0.125 2024-08-13 02:34:26,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1948790.0, ans=0.125 2024-08-13 02:34:31,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1948790.0, ans=0.125 2024-08-13 02:34:42,471 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.089e+01 2024-08-13 02:34:49,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1948890.0, ans=0.0 2024-08-13 02:34:52,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1948890.0, ans=0.2 2024-08-13 02:34:55,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6500, loss[loss=0.1092, beats_loss=0.01127, ecapa_loss=0.0001508, whisper_loss=0.09646, over 20781.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.0001679, whisper_loss=0.09212, over 3923435.49 frames. ], batch size: 84, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:35:09,397 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-13 02:35:51,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.473e+01 2.682e+01 2.925e+01 4.435e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-13 02:36:05,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1949390.0, ans=0.0 2024-08-13 02:36:07,982 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 02:36:09,721 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 02:36:17,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6550, loss[loss=0.09436, beats_loss=0.0137, ecapa_loss=0.0001455, whisper_loss=0.0792, over 21639.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01102, ecapa_loss=0.0001669, whisper_loss=0.09124, over 3961716.68 frames. ], batch size: 91, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:36:20,040 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 02:36:23,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1949490.0, ans=0.125 2024-08-13 02:36:50,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1949690.0, ans=0.125 2024-08-13 02:36:56,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-13 02:37:10,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1949790.0, ans=0.2 2024-08-13 02:37:41,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6600, loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.0001565, whisper_loss=0.09199, over 19192.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.000168, whisper_loss=0.09174, over 3988814.19 frames. ], batch size: 73, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:37:43,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1949990.0, ans=0.125 2024-08-13 02:38:46,512 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 02:38:48,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-08-13 02:38:56,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2024-08-13 02:39:14,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.510e+01 2.757e+01 3.096e+01 4.067e+01, threshold=5.514e+01, percent-clipped=0.0 2024-08-13 02:39:39,275 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6650, loss[loss=0.1037, beats_loss=0.01122, ecapa_loss=0.0001629, whisper_loss=0.09084, over 19188.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01089, ecapa_loss=0.0001675, whisper_loss=0.09226, over 3969561.39 frames. ], batch size: 78, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:40:00,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1950590.0, ans=0.125 2024-08-13 02:40:10,413 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 02:40:16,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1950690.0, ans=0.0 2024-08-13 02:40:20,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2024-08-13 02:40:21,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-13 02:40:33,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1950790.0, ans=0.125 2024-08-13 02:40:56,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1950890.0, ans=0.125 2024-08-13 02:40:59,589 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 02:41:01,889 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 02:41:16,352 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6700, loss[loss=0.09116, beats_loss=0.0122, ecapa_loss=0.0001344, whisper_loss=0.07762, over 14636.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01083, ecapa_loss=0.0001662, whisper_loss=0.09309, over 3941093.31 frames. ], batch size: 58, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:41:19,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1950990.0, ans=0.125 2024-08-13 02:41:34,206 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 02:41:38,741 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 02:42:21,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2024-08-13 02:42:23,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.594e+01 2.894e+01 3.478e+01 5.381e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-13 02:42:29,438 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 02:43:00,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6750, loss[loss=0.1086, beats_loss=0.01144, ecapa_loss=0.000176, whisper_loss=0.09542, over 23106.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01077, ecapa_loss=0.0001669, whisper_loss=0.09392, over 3932924.35 frames. ], batch size: 91, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:43:04,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1951490.0, ans=0.1 2024-08-13 02:43:22,774 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 02:43:57,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1951690.0, ans=0.125 2024-08-13 02:44:09,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1951790.0, ans=0.0 2024-08-13 02:44:28,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1951790.0, ans=0.0 2024-08-13 02:44:33,103 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 17 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 02:44:43,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1951890.0, ans=0.2 2024-08-13 02:44:50,063 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 02:44:57,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6800, loss[loss=0.08917, beats_loss=0.01163, ecapa_loss=0.0001438, whisper_loss=0.0761, over 18596.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01085, ecapa_loss=0.0001676, whisper_loss=0.09276, over 3910975.34 frames. ], batch size: 75, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:45:21,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1952090.0, ans=0.0 2024-08-13 02:45:21,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1952090.0, ans=0.0 2024-08-13 02:45:21,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1952090.0, ans=0.2 2024-08-13 02:45:24,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1952090.0, ans=0.1 2024-08-13 02:45:35,106 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 02:45:35,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1952090.0, ans=0.2 2024-08-13 02:45:41,460 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.609e-03 2024-08-13 02:45:46,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1952190.0, ans=0.125 2024-08-13 02:46:11,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1952290.0, ans=0.125 2024-08-13 02:46:16,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1952290.0, ans=0.0 2024-08-13 02:46:16,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.452e+01 2.716e+01 3.076e+01 4.037e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 02:46:27,889 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 27 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-13 02:46:52,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6850, loss[loss=0.0993, beats_loss=0.008563, ecapa_loss=0.0002232, whisper_loss=0.0885, over 18609.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001683, whisper_loss=0.09203, over 3924659.24 frames. ], batch size: 74, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:46:57,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1952490.0, ans=0.125 2024-08-13 02:47:22,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.64 vs. limit=10.0 2024-08-13 02:47:47,494 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 02:48:29,306 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 02:48:30,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-08-13 02:48:43,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6900, loss[loss=0.1039, beats_loss=0.01181, ecapa_loss=0.0001528, whisper_loss=0.09056, over 19504.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01087, ecapa_loss=0.0001697, whisper_loss=0.09235, over 3914646.14 frames. ], batch size: 79, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:48:47,256 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 02:48:54,645 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 02:48:57,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1952990.0, ans=0.125 2024-08-13 02:49:06,353 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 02:49:07,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1953090.0, ans=0.125 2024-08-13 02:49:18,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1953190.0, ans=0.125 2024-08-13 02:49:26,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2024-08-13 02:49:33,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1953290.0, ans=0.125 2024-08-13 02:49:41,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.589e+01 2.754e+01 3.182e+01 2.951e+02, threshold=5.508e+01, percent-clipped=1.0 2024-08-13 02:49:45,855 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-13 02:49:54,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1953390.0, ans=0.125 2024-08-13 02:50:03,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1953390.0, ans=0.0 2024-08-13 02:50:07,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 6950, loss[loss=0.1195, beats_loss=0.01017, ecapa_loss=0.0001886, whisper_loss=0.1075, over 19178.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01093, ecapa_loss=0.0001689, whisper_loss=0.09252, over 3923965.73 frames. ], batch size: 80, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:50:10,789 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:50:25,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1953590.0, ans=0.0 2024-08-13 02:50:46,200 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 02:50:51,735 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 02:50:52,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1953690.0, ans=0.09899494936611666 2024-08-13 02:51:21,024 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-13 02:51:23,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1953890.0, ans=0.0 2024-08-13 02:51:41,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1953990.0, ans=0.1 2024-08-13 02:51:42,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7000, loss[loss=0.09957, beats_loss=0.01154, ecapa_loss=0.0001674, whisper_loss=0.08635, over 20226.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01085, ecapa_loss=0.0001699, whisper_loss=0.0927, over 3911998.15 frames. ], batch size: 80, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:52:02,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1954090.0, ans=0.125 2024-08-13 02:52:19,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2024-08-13 02:52:29,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1954190.0, ans=0.0 2024-08-13 02:52:48,670 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.444e+01 2.710e+01 2.918e+01 4.538e+01, threshold=5.419e+01, percent-clipped=0.0 2024-08-13 02:52:53,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1954290.0, ans=0.125 2024-08-13 02:52:57,685 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 02:53:02,728 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 26 from Vox, 16 fro AS 2024-08-13 02:53:09,812 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 02:53:16,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7050, loss[loss=0.08398, beats_loss=0.01162, ecapa_loss=0.0001731, whisper_loss=0.07063, over 15440.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001698, whisper_loss=0.09194, over 3883017.65 frames. ], batch size: 64, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:53:17,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1954490.0, ans=0.125 2024-08-13 02:53:19,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1954490.0, ans=0.0 2024-08-13 02:53:22,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1954490.0, ans=0.125 2024-08-13 02:53:22,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-13 02:53:34,987 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 02:53:35,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.56 vs. limit=10.0 2024-08-13 02:53:41,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1954590.0, ans=0.125 2024-08-13 02:53:50,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2024-08-13 02:53:52,134 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 02:54:19,749 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-13 02:54:32,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2024-08-13 02:54:47,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1954990.0, ans=0.04949747468305833 2024-08-13 02:54:48,368 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7100, loss[loss=0.1017, beats_loss=0.01074, ecapa_loss=0.0001507, whisper_loss=0.0895, over 14003.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001687, whisper_loss=0.09094, over 3854659.24 frames. ], batch size: 54, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:55:13,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2024-08-13 02:55:52,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.441e+01 2.743e+01 3.182e+01 6.176e+01, threshold=5.486e+01, percent-clipped=2.0 2024-08-13 02:56:05,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=12.0 2024-08-13 02:56:20,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7150, loss[loss=0.08679, beats_loss=0.01544, ecapa_loss=0.0001151, whisper_loss=0.0702, over 16502.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001689, whisper_loss=0.09132, over 3882597.78 frames. ], batch size: 65, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:56:21,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1955490.0, ans=0.0 2024-08-13 02:57:09,425 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 02:57:10,811 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 02:57:53,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7200, loss[loss=0.08977, beats_loss=0.009553, ecapa_loss=0.0001805, whisper_loss=0.07842, over 17819.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001688, whisper_loss=0.09107, over 3921695.40 frames. ], batch size: 71, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:58:07,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1955990.0, ans=0.1 2024-08-13 02:58:20,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1956090.0, ans=0.0 2024-08-13 02:58:33,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=12.0 2024-08-13 02:58:40,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1956190.0, ans=0.125 2024-08-13 02:58:56,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.408e+01 2.678e+01 2.996e+01 6.633e+01, threshold=5.357e+01, percent-clipped=2.0 2024-08-13 02:58:59,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1956290.0, ans=12.0 2024-08-13 02:59:06,292 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 02:59:23,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7250, loss[loss=0.09085, beats_loss=0.01357, ecapa_loss=0.0001251, whisper_loss=0.07603, over 22672.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001697, whisper_loss=0.09156, over 3894784.12 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:00:01,530 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 03:00:09,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1956690.0, ans=0.125 2024-08-13 03:00:25,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1956790.0, ans=0.0 2024-08-13 03:00:47,546 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 03:00:48,954 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 03:00:53,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7300, loss[loss=0.1096, beats_loss=0.01213, ecapa_loss=0.000147, whisper_loss=0.096, over 18619.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001707, whisper_loss=0.09147, over 3901840.49 frames. ], batch size: 72, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:01:09,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1957090.0, ans=0.125 2024-08-13 03:01:15,767 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:01:17,036 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 03:01:27,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1957090.0, ans=0.125 2024-08-13 03:01:30,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1957190.0, ans=0.0 2024-08-13 03:01:48,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1957290.0, ans=0.0 2024-08-13 03:01:49,156 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 03:01:55,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.448e+01 2.774e+01 3.121e+01 5.439e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-13 03:02:20,999 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7350, loss[loss=0.09304, beats_loss=0.01247, ecapa_loss=0.0001354, whisper_loss=0.07922, over 17606.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001695, whisper_loss=0.09152, over 3868641.11 frames. ], batch size: 69, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:02:22,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2024-08-13 03:02:26,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1957490.0, ans=0.125 2024-08-13 03:02:31,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1957490.0, ans=0.125 2024-08-13 03:02:31,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1957490.0, ans=0.125 2024-08-13 03:02:34,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1957490.0, ans=0.0 2024-08-13 03:02:53,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1957690.0, ans=0.0 2024-08-13 03:02:55,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1957690.0, ans=0.125 2024-08-13 03:02:55,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1957690.0, ans=0.5 2024-08-13 03:03:06,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1957690.0, ans=0.125 2024-08-13 03:03:20,777 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 03:03:31,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1957890.0, ans=0.0 2024-08-13 03:03:38,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1957890.0, ans=0.0 2024-08-13 03:03:39,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2024-08-13 03:03:45,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7400, loss[loss=0.1206, beats_loss=0.009675, ecapa_loss=0.0001779, whisper_loss=0.1092, over 22554.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.00017, whisper_loss=0.09138, over 3867801.29 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:04:10,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2024-08-13 03:04:19,264 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 03:04:27,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1958190.0, ans=0.125 2024-08-13 03:04:31,296 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 03:04:39,654 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 03:04:43,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1958290.0, ans=0.125 2024-08-13 03:04:44,001 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.515e+01 2.775e+01 3.372e+01 5.725e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 03:04:45,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1958290.0, ans=0.125 2024-08-13 03:04:56,714 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 03:05:04,184 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 03:05:09,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7450, loss[loss=0.1195, beats_loss=0.009885, ecapa_loss=0.0001512, whisper_loss=0.1081, over 22053.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01096, ecapa_loss=0.0001694, whisper_loss=0.09108, over 3886760.16 frames. ], batch size: 85, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:05:12,560 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 03:05:19,831 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 03:05:23,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1958490.0, ans=0.1 2024-08-13 03:05:24,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1958590.0, ans=0.125 2024-08-13 03:05:24,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1958590.0, ans=10.0 2024-08-13 03:05:56,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1958690.0, ans=0.0 2024-08-13 03:06:01,327 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-13 03:06:02,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-13 03:06:18,032 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 46 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 03:06:31,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7500, loss[loss=0.105, beats_loss=0.01058, ecapa_loss=0.0001833, whisper_loss=0.09261, over 21152.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001698, whisper_loss=0.09139, over 3897562.47 frames. ], batch size: 84, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:06:38,697 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 03:06:42,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1958990.0, ans=0.0 2024-08-13 03:06:45,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1958990.0, ans=0.125 2024-08-13 03:07:10,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1959190.0, ans=0.07 2024-08-13 03:07:27,656 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 03:07:28,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.459e+01 2.697e+01 3.000e+01 4.880e+01, threshold=5.394e+01, percent-clipped=0.0 2024-08-13 03:07:34,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1959290.0, ans=0.025 2024-08-13 03:07:52,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7550, loss[loss=0.103, beats_loss=0.01123, ecapa_loss=0.000178, whisper_loss=0.08997, over 19630.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01096, ecapa_loss=0.0001686, whisper_loss=0.09085, over 3874743.72 frames. ], batch size: 80, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:08:25,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2024-08-13 03:08:50,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1959790.0, ans=0.09899494936611666 2024-08-13 03:08:51,844 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-13 03:09:11,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7600, loss[loss=0.1142, beats_loss=0.00882, ecapa_loss=0.0001918, whisper_loss=0.1035, over 14329.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01091, ecapa_loss=0.000169, whisper_loss=0.09154, over 3898909.13 frames. ], batch size: 57, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:09:16,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1959990.0, ans=0.0 2024-08-13 03:09:17,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2024-08-13 03:09:41,692 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 03:09:43,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1960190.0, ans=0.125 2024-08-13 03:09:54,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1960190.0, ans=0.2 2024-08-13 03:10:08,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.552e+01 2.815e+01 3.111e+01 1.865e+02, threshold=5.629e+01, percent-clipped=3.0 2024-08-13 03:10:27,757 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 03:10:32,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7650, loss[loss=0.1029, beats_loss=0.01339, ecapa_loss=0.0001173, whisper_loss=0.08832, over 24105.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001682, whisper_loss=0.09107, over 3902068.65 frames. ], batch size: 94, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:10:34,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1960490.0, ans=0.0 2024-08-13 03:10:52,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1960590.0, ans=0.125 2024-08-13 03:11:20,202 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-13 03:11:22,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1960790.0, ans=0.0 2024-08-13 03:11:25,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1960790.0, ans=0.0 2024-08-13 03:11:27,855 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 03:11:43,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1960890.0, ans=0.2 2024-08-13 03:11:46,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2024-08-13 03:11:49,391 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 03:11:50,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-08-13 03:11:50,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7700, loss[loss=0.1027, beats_loss=0.009928, ecapa_loss=0.0001997, whisper_loss=0.09077, over 21540.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01094, ecapa_loss=0.0001675, whisper_loss=0.09148, over 3904203.62 frames. ], batch size: 89, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:11:51,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1960990.0, ans=0.125 2024-08-13 03:11:58,532 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 03:12:01,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1960990.0, ans=0.125 2024-08-13 03:12:02,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1960990.0, ans=0.125 2024-08-13 03:12:16,324 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.549e+01 2024-08-13 03:12:27,853 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 03:12:34,389 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 14 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 03:12:44,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.500e+01 2.815e+01 3.285e+01 6.862e+01, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 03:12:47,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-13 03:12:52,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1961390.0, ans=0.1 2024-08-13 03:12:58,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1961390.0, ans=0.125 2024-08-13 03:13:07,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1961490.0, ans=0.125 2024-08-13 03:13:08,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7750, loss[loss=0.09864, beats_loss=0.008571, ecapa_loss=0.0001683, whisper_loss=0.08839, over 18904.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01095, ecapa_loss=0.0001667, whisper_loss=0.09056, over 3875850.16 frames. ], batch size: 73, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:13:47,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1961690.0, ans=0.125 2024-08-13 03:14:00,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-13 03:14:04,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1961790.0, ans=0.025 2024-08-13 03:14:20,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-13 03:14:25,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7800, loss[loss=0.1136, beats_loss=0.01211, ecapa_loss=0.0001272, whisper_loss=0.1002, over 19424.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01088, ecapa_loss=0.0001672, whisper_loss=0.09109, over 3851311.48 frames. ], batch size: 76, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:14:34,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1961990.0, ans=0.125 2024-08-13 03:14:40,331 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 03:14:40,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1962090.0, ans=0.125 2024-08-13 03:15:04,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1962190.0, ans=0.0 2024-08-13 03:15:15,756 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 03:15:19,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.419e+01 2.661e+01 3.121e+01 6.090e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 03:15:42,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1962490.0, ans=0.125 2024-08-13 03:15:43,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7850, loss[loss=0.1225, beats_loss=0.01217, ecapa_loss=0.0001218, whisper_loss=0.1091, over 21544.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001666, whisper_loss=0.09161, over 3863077.64 frames. ], batch size: 79, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:15:47,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2024-08-13 03:16:01,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1962590.0, ans=0.125 2024-08-13 03:16:47,099 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 03:16:53,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1962890.0, ans=0.125 2024-08-13 03:17:00,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7900, loss[loss=0.1079, beats_loss=0.0134, ecapa_loss=0.0001273, whisper_loss=0.09326, over 22853.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001667, whisper_loss=0.09179, over 3862505.43 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:17:21,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1963090.0, ans=0.125 2024-08-13 03:17:26,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1963090.0, ans=0.125 2024-08-13 03:17:34,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-13 03:17:48,051 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 03:17:52,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.438e+01 2.739e+01 3.083e+01 5.244e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-13 03:18:05,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.67 vs. limit=5.0 2024-08-13 03:18:09,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1963390.0, ans=0.0 2024-08-13 03:18:14,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 7950, loss[loss=0.1144, beats_loss=0.01022, ecapa_loss=0.0001415, whisper_loss=0.1027, over 22007.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01095, ecapa_loss=0.0001669, whisper_loss=0.09195, over 3864210.37 frames. ], batch size: 85, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:18:37,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-08-13 03:18:45,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1963690.0, ans=0.125 2024-08-13 03:19:03,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-08-13 03:19:12,322 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 03:19:14,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2024-08-13 03:19:22,323 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 03:19:24,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1963890.0, ans=0.2 2024-08-13 03:19:28,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8000, loss[loss=0.09026, beats_loss=0.01095, ecapa_loss=0.0001514, whisper_loss=0.0778, over 16691.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001674, whisper_loss=0.09243, over 3857437.76 frames. ], batch size: 64, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:19:29,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1963990.0, ans=0.5 2024-08-13 03:19:34,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-08-13 03:19:37,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1963990.0, ans=0.1 2024-08-13 03:19:41,673 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 03:19:44,253 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 03:19:46,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1964090.0, ans=0.0 2024-08-13 03:20:21,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.304e+01 2.712e+01 2.987e+01 5.432e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 03:20:21,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1964290.0, ans=10.0 2024-08-13 03:20:21,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1964290.0, ans=0.2 2024-08-13 03:20:24,216 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 31 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 03:20:31,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1964390.0, ans=0.125 2024-08-13 03:20:34,068 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 03:20:42,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8050, loss[loss=0.1281, beats_loss=0.00983, ecapa_loss=0.0001748, whisper_loss=0.1165, over 19886.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01085, ecapa_loss=0.0001679, whisper_loss=0.09245, over 3832069.35 frames. ], batch size: 76, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:20:50,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1964490.0, ans=10.0 2024-08-13 03:20:54,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1964490.0, ans=0.125 2024-08-13 03:21:20,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1964690.0, ans=0.2 2024-08-13 03:21:27,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1964790.0, ans=0.1 2024-08-13 03:21:30,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1964790.0, ans=0.125 2024-08-13 03:21:31,322 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 03:21:50,866 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 03:21:51,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8100, loss[loss=0.1145, beats_loss=0.01023, ecapa_loss=0.0001841, whisper_loss=0.1024, over 22896.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01084, ecapa_loss=0.0001677, whisper_loss=0.09278, over 3887865.42 frames. ], batch size: 91, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:21:53,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1964990.0, ans=0.04949747468305833 2024-08-13 03:22:24,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1965190.0, ans=0.125 2024-08-13 03:22:32,204 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 03:22:35,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1965290.0, ans=0.125 2024-08-13 03:22:39,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.433e+01 2.725e+01 3.019e+01 1.220e+02, threshold=5.449e+01, percent-clipped=1.0 2024-08-13 03:22:41,842 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 03:22:46,276 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 03:22:57,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2024-08-13 03:23:01,258 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8150, loss[loss=0.1195, beats_loss=0.009595, ecapa_loss=0.0001701, whisper_loss=0.1082, over 22853.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0108, ecapa_loss=0.0001707, whisper_loss=0.09224, over 3908128.95 frames. ], batch size: 92, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:23:01,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1965490.0, ans=0.025 2024-08-13 03:23:02,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1965490.0, ans=0.125 2024-08-13 03:23:12,895 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 03:23:16,922 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 03:23:20,842 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 03:23:37,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-08-13 03:23:46,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1965790.0, ans=0.0 2024-08-13 03:24:05,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1965890.0, ans=0.07 2024-08-13 03:24:10,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8200, loss[loss=0.109, beats_loss=0.01037, ecapa_loss=0.0001686, whisper_loss=0.09699, over 23483.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01071, ecapa_loss=0.000171, whisper_loss=0.09258, over 3917835.71 frames. ], batch size: 92, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:24:34,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1966090.0, ans=0.125 2024-08-13 03:24:36,980 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 03:24:49,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1966190.0, ans=0.125 2024-08-13 03:24:56,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1966290.0, ans=0.0 2024-08-13 03:24:58,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.561e+01 2.768e+01 3.091e+01 7.365e+01, threshold=5.537e+01, percent-clipped=2.0 2024-08-13 03:25:05,445 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 03:25:19,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8250, loss[loss=0.115, beats_loss=0.01216, ecapa_loss=0.0001723, whisper_loss=0.1011, over 22171.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001698, whisper_loss=0.092, over 3906575.82 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:25:19,306 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 30 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 03:25:21,708 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 03:25:22,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1966490.0, ans=0.0 2024-08-13 03:25:31,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1966590.0, ans=0.125 2024-08-13 03:25:38,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1966590.0, ans=0.125 2024-08-13 03:25:47,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1966690.0, ans=0.125 2024-08-13 03:26:03,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1966790.0, ans=0.125 2024-08-13 03:26:05,784 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 03:26:13,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1966890.0, ans=0.125 2024-08-13 03:26:25,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8300, loss[loss=0.07851, beats_loss=0.01335, ecapa_loss=0.0001802, whisper_loss=0.06335, over 22323.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01092, ecapa_loss=0.000169, whisper_loss=0.09074, over 3873884.74 frames. ], batch size: 94, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:26:28,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1966990.0, ans=0.125 2024-08-13 03:26:37,586 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-13 03:26:37,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1967090.0, ans=0.125 2024-08-13 03:26:46,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2024-08-13 03:26:46,795 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 03:26:47,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-08-13 03:27:03,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1967190.0, ans=0.2 2024-08-13 03:27:10,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-13 03:27:12,899 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.397e+01 2.699e+01 2.951e+01 6.635e+01, threshold=5.397e+01, percent-clipped=2.0 2024-08-13 03:27:14,446 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 03:27:26,989 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.147e-02 2024-08-13 03:27:33,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8350, loss[loss=0.113, beats_loss=0.008817, ecapa_loss=0.0001895, whisper_loss=0.1022, over 22361.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001685, whisper_loss=0.09122, over 3880472.02 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:27:54,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1967590.0, ans=0.0 2024-08-13 03:27:59,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1967690.0, ans=0.125 2024-08-13 03:28:07,639 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 03:28:37,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1967890.0, ans=0.0 2024-08-13 03:28:38,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1967890.0, ans=0.125 2024-08-13 03:28:42,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8400, loss[loss=0.0979, beats_loss=0.009722, ecapa_loss=0.0002305, whisper_loss=0.08587, over 21474.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001693, whisper_loss=0.09121, over 3881943.56 frames. ], batch size: 91, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:28:43,874 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 03:28:58,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=1968090.0, ans=15.0 2024-08-13 03:29:02,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1968090.0, ans=0.125 2024-08-13 03:29:10,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1968190.0, ans=0.125 2024-08-13 03:29:14,670 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 14 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 03:29:17,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1968190.0, ans=0.125 2024-08-13 03:29:27,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.79 vs. limit=10.0 2024-08-13 03:29:30,997 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.441e+01 2.759e+01 3.099e+01 1.310e+02, threshold=5.518e+01, percent-clipped=1.0 2024-08-13 03:29:31,232 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 03:29:37,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1968390.0, ans=0.125 2024-08-13 03:29:45,288 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 03:29:51,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8450, loss[loss=0.1107, beats_loss=0.01045, ecapa_loss=0.0001615, whisper_loss=0.09864, over 20553.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001691, whisper_loss=0.0915, over 3854101.76 frames. ], batch size: 80, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:30:01,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1968490.0, ans=0.125 2024-08-13 03:30:03,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1968490.0, ans=0.05 2024-08-13 03:30:17,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2024-08-13 03:30:49,546 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 03:30:59,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8500, loss[loss=0.1151, beats_loss=0.01128, ecapa_loss=0.000163, whisper_loss=0.1022, over 18384.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001693, whisper_loss=0.09152, over 3854375.41 frames. ], batch size: 72, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:31:05,921 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 03:31:13,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=12.0 2024-08-13 03:31:48,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.417e+01 2.734e+01 3.054e+01 8.886e+01, threshold=5.467e+01, percent-clipped=1.0 2024-08-13 03:31:48,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1969290.0, ans=0.0 2024-08-13 03:31:51,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-13 03:31:56,576 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 03:32:05,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-13 03:32:08,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8550, loss[loss=0.1126, beats_loss=0.008686, ecapa_loss=0.0001914, whisper_loss=0.102, over 14168.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01072, ecapa_loss=0.0001693, whisper_loss=0.09247, over 3877168.21 frames. ], batch size: 56, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:32:59,490 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 03:33:08,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1969890.0, ans=0.1 2024-08-13 03:33:16,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8600, loss[loss=0.1001, beats_loss=0.009066, ecapa_loss=0.0001706, whisper_loss=0.0893, over 18873.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01071, ecapa_loss=0.0001693, whisper_loss=0.09273, over 3874109.42 frames. ], batch size: 75, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:33:17,159 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 03:34:02,835 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 03:34:06,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.434e+01 2.755e+01 2.994e+01 8.345e+01, threshold=5.511e+01, percent-clipped=1.0 2024-08-13 03:34:16,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1970390.0, ans=0.1 2024-08-13 03:34:21,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1970390.0, ans=0.125 2024-08-13 03:34:28,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8650, loss[loss=0.09763, beats_loss=0.01049, ecapa_loss=0.0001982, whisper_loss=0.08516, over 13809.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01078, ecapa_loss=0.000169, whisper_loss=0.09218, over 3850434.94 frames. ], batch size: 56, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:34:35,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1970490.0, ans=0.125 2024-08-13 03:34:36,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1970490.0, ans=0.125 2024-08-13 03:34:37,726 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 03:34:48,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1970590.0, ans=0.125 2024-08-13 03:34:54,203 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 03:35:02,906 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 03:35:10,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1970690.0, ans=0.0 2024-08-13 03:35:10,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1970690.0, ans=0.125 2024-08-13 03:35:15,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2024-08-13 03:35:18,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2024-08-13 03:35:22,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1970790.0, ans=0.0 2024-08-13 03:35:32,551 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 03:35:43,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1970990.0, ans=0.125 2024-08-13 03:35:44,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8700, loss[loss=0.09332, beats_loss=0.01013, ecapa_loss=0.0001929, whisper_loss=0.08127, over 22772.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01072, ecapa_loss=0.0001696, whisper_loss=0.09228, over 3840528.85 frames. ], batch size: 94, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:35:54,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1970990.0, ans=15.0 2024-08-13 03:36:16,235 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-13 03:36:37,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1971290.0, ans=0.5 2024-08-13 03:36:40,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.522e+01 2.761e+01 3.315e+01 1.069e+02, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 03:36:43,274 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 03:36:49,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1971390.0, ans=0.2 2024-08-13 03:36:51,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1971390.0, ans=0.0 2024-08-13 03:36:57,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1971390.0, ans=0.2 2024-08-13 03:37:05,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8750, loss[loss=0.09373, beats_loss=0.01204, ecapa_loss=0.0001786, whisper_loss=0.0799, over 22671.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001696, whisper_loss=0.09196, over 3851893.55 frames. ], batch size: 93, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:37:10,334 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 03:37:10,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1971490.0, ans=0.125 2024-08-13 03:37:17,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1971490.0, ans=0.125 2024-08-13 03:37:26,599 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 03:37:29,262 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 03:37:57,556 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 03:38:02,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1971790.0, ans=0.0 2024-08-13 03:38:14,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1971890.0, ans=22.5 2024-08-13 03:38:24,569 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8800, loss[loss=0.09937, beats_loss=0.009299, ecapa_loss=0.0002014, whisper_loss=0.08806, over 21882.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01091, ecapa_loss=0.0001677, whisper_loss=0.09099, over 3831410.30 frames. ], batch size: 91, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:38:45,956 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 03:39:04,011 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 03:39:23,246 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.400e+01 2.713e+01 2.983e+01 4.963e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-13 03:39:36,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1972390.0, ans=0.0 2024-08-13 03:39:40,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1972390.0, ans=0.2 2024-08-13 03:39:40,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1972390.0, ans=0.0 2024-08-13 03:39:41,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=22.5 2024-08-13 03:39:46,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8850, loss[loss=0.1032, beats_loss=0.01086, ecapa_loss=0.0001668, whisper_loss=0.09064, over 17277.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001669, whisper_loss=0.0915, over 3848207.15 frames. ], batch size: 68, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:40:06,064 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 03:40:08,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1972590.0, ans=0.125 2024-08-13 03:40:10,349 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 03:40:19,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1972690.0, ans=0.125 2024-08-13 03:40:22,109 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 11 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-13 03:40:22,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1972690.0, ans=0.125 2024-08-13 03:40:36,124 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 40 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 03:40:39,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1972790.0, ans=0.2 2024-08-13 03:40:43,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1972790.0, ans=0.1 2024-08-13 03:40:52,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1972890.0, ans=0.125 2024-08-13 03:41:04,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1972890.0, ans=0.2 2024-08-13 03:41:07,380 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:41:08,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8900, loss[loss=0.07805, beats_loss=0.009723, ecapa_loss=0.0001544, whisper_loss=0.06679, over 15954.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001672, whisper_loss=0.09188, over 3865390.89 frames. ], batch size: 59, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:41:10,410 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 03:41:11,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-08-13 03:41:13,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=12.0 2024-08-13 03:41:17,291 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 03:41:19,180 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 03:41:26,937 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 03:41:44,844 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 03:41:52,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1973190.0, ans=0.0 2024-08-13 03:41:54,255 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 13 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 03:41:56,250 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 03:42:05,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2024-08-13 03:42:05,588 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.456e+01 2.768e+01 3.242e+01 5.170e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-13 03:42:29,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 8950, loss[loss=0.07934, beats_loss=0.01266, ecapa_loss=0.000121, whisper_loss=0.06547, over 15831.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001666, whisper_loss=0.09107, over 3838802.72 frames. ], batch size: 59, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:42:49,514 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 03:42:51,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1973590.0, ans=0.0 2024-08-13 03:42:59,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1973590.0, ans=0.0 2024-08-13 03:43:00,073 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 03:43:04,370 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 03:43:14,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1973690.0, ans=0.025 2024-08-13 03:43:17,101 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 03:43:33,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.66 vs. limit=10.0 2024-08-13 03:43:36,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-13 03:43:40,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-13 03:43:47,954 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9000, loss[loss=0.1052, beats_loss=0.01239, ecapa_loss=0.000137, whisper_loss=0.09144, over 22414.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01095, ecapa_loss=0.0001675, whisper_loss=0.09031, over 3841510.02 frames. ], batch size: 90, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:43:47,955 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 03:44:28,222 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005752, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 03:44:46,332 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on SV_voxceleb1: loss=0.004584, beats_loss=0, ecapa_loss=0.0004584, whisper_loss=0, over 939242.00 frames. 2024-08-13 03:46:42,122 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on AT_audioset: loss=0.02386, beats_loss=0.02386, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 03:46:42,126 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 03:46:53,992 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 03:47:24,851 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 03:47:25,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1974190.0, ans=0.125 2024-08-13 03:47:32,152 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 03:47:41,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.559e+01 2.793e+01 3.240e+01 5.167e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-13 03:47:51,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1974390.0, ans=0.125 2024-08-13 03:47:53,872 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 03:47:56,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1974390.0, ans=0.125 2024-08-13 03:47:58,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1974390.0, ans=0.0 2024-08-13 03:48:07,098 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9050, loss[loss=0.1031, beats_loss=0.008031, ecapa_loss=0.000228, whisper_loss=0.09281, over 13272.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001677, whisper_loss=0.09115, over 3822150.29 frames. ], batch size: 56, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:48:22,708 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 03:48:26,210 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 03:48:29,231 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 03:48:45,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1974690.0, ans=0.025 2024-08-13 03:48:50,543 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 03:48:53,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-13 03:48:55,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2024-08-13 03:49:08,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1974790.0, ans=0.0 2024-08-13 03:49:13,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-13 03:49:17,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1974890.0, ans=0.125 2024-08-13 03:49:21,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1974890.0, ans=0.0 2024-08-13 03:49:28,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9100, loss[loss=0.09245, beats_loss=0.009825, ecapa_loss=0.0001394, whisper_loss=0.08123, over 14610.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01087, ecapa_loss=0.0001679, whisper_loss=0.09016, over 3797366.61 frames. ], batch size: 55, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:49:32,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1974990.0, ans=0.2 2024-08-13 03:49:43,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1975090.0, ans=0.0 2024-08-13 03:49:43,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-08-13 03:49:55,924 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 03:50:12,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=12.0 2024-08-13 03:50:19,119 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 19 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-13 03:50:26,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.465e+01 2.794e+01 3.182e+01 5.687e+01, threshold=5.588e+01, percent-clipped=1.0 2024-08-13 03:50:52,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9150, loss[loss=0.1214, beats_loss=0.0093, ecapa_loss=0.0001629, whisper_loss=0.1105, over 22957.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01087, ecapa_loss=0.0001672, whisper_loss=0.09091, over 3811090.52 frames. ], batch size: 89, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:50:59,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1975490.0, ans=0.0 2024-08-13 03:51:04,270 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 03:51:05,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-08-13 03:51:22,257 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 03:51:36,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1975690.0, ans=0.0 2024-08-13 03:51:49,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1975790.0, ans=0.1 2024-08-13 03:52:13,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9200, loss[loss=0.09494, beats_loss=0.01189, ecapa_loss=0.0001764, whisper_loss=0.08129, over 20148.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01085, ecapa_loss=0.0001679, whisper_loss=0.09107, over 3833574.19 frames. ], batch size: 83, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:52:47,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1976190.0, ans=0.0 2024-08-13 03:52:49,850 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 03:53:07,301 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 03:53:11,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.444e+01 2.723e+01 3.266e+01 6.783e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 03:53:12,777 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 03:53:15,524 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 03:53:21,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1976390.0, ans=0.125 2024-08-13 03:53:22,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-13 03:53:23,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1976390.0, ans=0.07 2024-08-13 03:53:32,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9250, loss[loss=0.1148, beats_loss=0.01086, ecapa_loss=0.0001751, whisper_loss=0.1022, over 21183.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0108, ecapa_loss=0.000169, whisper_loss=0.09143, over 3837334.30 frames. ], batch size: 85, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:53:44,353 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 03:54:01,623 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 03:54:11,563 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 03:54:13,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1976690.0, ans=0.125 2024-08-13 03:54:17,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.41 vs. limit=10.0 2024-08-13 03:54:18,349 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 03:54:31,299 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 03:54:37,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2024-08-13 03:54:40,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1976890.0, ans=0.0 2024-08-13 03:54:49,975 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:54:50,817 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 03:54:53,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1976990.0, ans=0.125 2024-08-13 03:54:54,739 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9300, loss[loss=0.08911, beats_loss=0.01099, ecapa_loss=0.0001697, whisper_loss=0.07642, over 17477.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001686, whisper_loss=0.09152, over 3851626.97 frames. ], batch size: 70, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:55:01,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1976990.0, ans=0.1 2024-08-13 03:55:11,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1977090.0, ans=0.125 2024-08-13 03:55:13,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-08-13 03:55:39,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1977190.0, ans=0.1 2024-08-13 03:55:51,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1977290.0, ans=0.0 2024-08-13 03:55:55,405 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.460e+01 2.642e+01 2.957e+01 1.771e+02, threshold=5.283e+01, percent-clipped=2.0 2024-08-13 03:56:00,911 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 03:56:08,741 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 03:56:18,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9350, loss[loss=0.1324, beats_loss=0.008051, ecapa_loss=0.0001876, whisper_loss=0.1225, over 19265.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.0001698, whisper_loss=0.09193, over 3865820.11 frames. ], batch size: 74, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:56:25,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1977490.0, ans=0.1 2024-08-13 03:56:29,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1977490.0, ans=0.0 2024-08-13 03:56:41,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1977590.0, ans=0.125 2024-08-13 03:56:58,855 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 03:56:59,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1977690.0, ans=0.125 2024-08-13 03:57:20,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1977790.0, ans=0.0 2024-08-13 03:57:35,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1977890.0, ans=0.1 2024-08-13 03:57:38,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9400, loss[loss=0.1204, beats_loss=0.008677, ecapa_loss=0.0002211, whisper_loss=0.1095, over 15456.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001679, whisper_loss=0.09143, over 3857001.64 frames. ], batch size: 63, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:57:57,993 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 03:58:00,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1978090.0, ans=0.1 2024-08-13 03:58:07,328 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 03:58:11,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1978190.0, ans=0.125 2024-08-13 03:58:12,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1978190.0, ans=0.125 2024-08-13 03:58:14,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1978190.0, ans=0.1 2024-08-13 03:58:15,295 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 03:58:19,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1978190.0, ans=0.125 2024-08-13 03:58:28,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1978290.0, ans=0.2 2024-08-13 03:58:33,806 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-13 03:58:34,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.425e+01 2.641e+01 3.063e+01 7.732e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-13 03:58:44,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1978390.0, ans=0.2 2024-08-13 03:58:45,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2024-08-13 03:58:57,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9450, loss[loss=0.1112, beats_loss=0.01165, ecapa_loss=0.0001687, whisper_loss=0.09789, over 14995.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001686, whisper_loss=0.09138, over 3842375.90 frames. ], batch size: 59, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:59:08,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1978490.0, ans=0.2 2024-08-13 03:59:15,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2024-08-13 03:59:25,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1978590.0, ans=0.95 2024-08-13 03:59:37,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1978690.0, ans=0.125 2024-08-13 04:00:17,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9500, loss[loss=0.1233, beats_loss=0.009756, ecapa_loss=0.000186, whisper_loss=0.1117, over 14550.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001699, whisper_loss=0.09099, over 3865782.58 frames. ], batch size: 55, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:00:25,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1978990.0, ans=0.0 2024-08-13 04:00:41,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2024-08-13 04:00:54,531 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 04:00:54,978 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.701e-03 2024-08-13 04:00:59,207 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 04:01:08,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2024-08-13 04:01:09,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1979290.0, ans=0.1 2024-08-13 04:01:12,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.497e+01 2.737e+01 3.144e+01 1.195e+02, threshold=5.474e+01, percent-clipped=3.0 2024-08-13 04:01:13,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-13 04:01:21,683 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 04:01:24,819 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 04:01:27,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1979390.0, ans=0.125 2024-08-13 04:01:31,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1979390.0, ans=0.125 2024-08-13 04:01:33,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9550, loss[loss=0.1086, beats_loss=0.01183, ecapa_loss=0.0001648, whisper_loss=0.09515, over 23190.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.000171, whisper_loss=0.09068, over 3851798.34 frames. ], batch size: 92, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:01:41,802 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 04:01:55,251 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 04:01:55,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1979590.0, ans=0.0 2024-08-13 04:02:00,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1979590.0, ans=0.1 2024-08-13 04:02:03,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1979690.0, ans=0.125 2024-08-13 04:02:10,063 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 04:02:10,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1979690.0, ans=0.2 2024-08-13 04:02:16,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-13 04:02:41,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1979890.0, ans=0.1 2024-08-13 04:02:44,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1979890.0, ans=0.1 2024-08-13 04:02:44,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-13 04:02:46,692 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9600, loss[loss=0.1039, beats_loss=0.009535, ecapa_loss=0.0001858, whisper_loss=0.09254, over 21382.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001711, whisper_loss=0.09078, over 3831604.65 frames. ], batch size: 88, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:02:59,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1980090.0, ans=0.125 2024-08-13 04:03:00,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1980090.0, ans=0.0 2024-08-13 04:03:03,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1980090.0, ans=0.0 2024-08-13 04:03:06,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.79 vs. limit=5.0 2024-08-13 04:03:19,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1980190.0, ans=15.0 2024-08-13 04:03:30,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-13 04:03:36,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.597e+01 2.785e+01 3.117e+01 4.817e+01, threshold=5.569e+01, percent-clipped=0.0 2024-08-13 04:03:47,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-08-13 04:03:50,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2024-08-13 04:03:54,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1980490.0, ans=0.2 2024-08-13 04:03:55,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9650, loss[loss=0.1077, beats_loss=0.009355, ecapa_loss=0.0001543, whisper_loss=0.0968, over 19535.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001695, whisper_loss=0.09116, over 3816833.16 frames. ], batch size: 76, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:04:03,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1980490.0, ans=0.125 2024-08-13 04:04:03,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1980490.0, ans=0.0 2024-08-13 04:04:03,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2024-08-13 04:04:24,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1980690.0, ans=0.2 2024-08-13 04:04:24,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-13 04:04:35,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1980690.0, ans=0.2 2024-08-13 04:04:36,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1980790.0, ans=10.0 2024-08-13 04:04:52,622 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 04:05:00,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1980890.0, ans=0.125 2024-08-13 04:05:04,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1980990.0, ans=0.1 2024-08-13 04:05:05,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9700, loss[loss=0.1046, beats_loss=0.009918, ecapa_loss=0.0001631, whisper_loss=0.09305, over 22786.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001708, whisper_loss=0.09104, over 3847478.93 frames. ], batch size: 91, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:05:19,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1981090.0, ans=0.0 2024-08-13 04:05:28,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1981090.0, ans=0.125 2024-08-13 04:05:49,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.10 vs. limit=10.0 2024-08-13 04:05:55,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.457e+01 2.661e+01 2.979e+01 4.854e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 04:05:59,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1981290.0, ans=0.0 2024-08-13 04:06:14,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9750, loss[loss=0.1121, beats_loss=0.01152, ecapa_loss=0.0001658, whisper_loss=0.09888, over 21399.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001695, whisper_loss=0.09047, over 3848908.29 frames. ], batch size: 87, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:06:16,336 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 04:06:24,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1981490.0, ans=0.125 2024-08-13 04:06:25,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2024-08-13 04:06:52,076 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-13 04:06:52,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1981690.0, ans=0.125 2024-08-13 04:06:53,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1981690.0, ans=0.0 2024-08-13 04:06:55,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1981790.0, ans=0.125 2024-08-13 04:07:11,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=12.0 2024-08-13 04:07:15,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1981890.0, ans=0.0 2024-08-13 04:07:24,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9800, loss[loss=0.1082, beats_loss=0.01353, ecapa_loss=0.0001254, whisper_loss=0.09345, over 23627.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001699, whisper_loss=0.09039, over 3853246.78 frames. ], batch size: 92, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:07:36,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1981990.0, ans=0.2 2024-08-13 04:07:40,275 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 04:07:56,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1982190.0, ans=0.0 2024-08-13 04:08:04,291 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 04:08:04,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1982190.0, ans=0.1 2024-08-13 04:08:07,232 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 04:08:09,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-13 04:08:15,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.387e+01 2.562e+01 2.934e+01 4.315e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-13 04:08:29,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1982390.0, ans=0.025 2024-08-13 04:08:29,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2024-08-13 04:08:34,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9850, loss[loss=0.1486, beats_loss=0.008141, ecapa_loss=0.0001907, whisper_loss=0.1386, over 23547.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001703, whisper_loss=0.09115, over 3877994.62 frames. ], batch size: 90, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:08:54,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1982590.0, ans=0.0 2024-08-13 04:08:55,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1982590.0, ans=0.1 2024-08-13 04:09:07,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=12.0 2024-08-13 04:09:31,828 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 04:09:39,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1982890.0, ans=0.2 2024-08-13 04:09:41,604 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 04:09:44,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9900, loss[loss=0.1059, beats_loss=0.01016, ecapa_loss=0.0002004, whisper_loss=0.09374, over 21876.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.000171, whisper_loss=0.09143, over 3877663.86 frames. ], batch size: 90, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:09:57,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1983090.0, ans=0.125 2024-08-13 04:10:02,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1983090.0, ans=0.1 2024-08-13 04:10:20,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1983190.0, ans=0.1 2024-08-13 04:10:23,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1983190.0, ans=0.125 2024-08-13 04:10:30,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1983290.0, ans=0.1 2024-08-13 04:10:34,491 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.489e+01 2.832e+01 3.268e+01 9.650e+01, threshold=5.664e+01, percent-clipped=3.0 2024-08-13 04:10:35,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1983290.0, ans=0.0 2024-08-13 04:10:53,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 9950, loss[loss=0.1009, beats_loss=0.01247, ecapa_loss=0.0001464, whisper_loss=0.087, over 15535.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001697, whisper_loss=0.09167, over 3869401.88 frames. ], batch size: 61, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:10:59,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2024-08-13 04:11:11,210 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 04:11:13,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1983590.0, ans=0.125 2024-08-13 04:11:23,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1983690.0, ans=0.125 2024-08-13 04:11:37,174 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 04:11:57,681 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 04:12:01,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10000, loss[loss=0.1282, beats_loss=0.008602, ecapa_loss=0.0001759, whisper_loss=0.1178, over 21162.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01091, ecapa_loss=0.0001688, whisper_loss=0.09153, over 3844505.16 frames. ], batch size: 79, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:12:36,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-13 04:12:52,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.356e+01 2.631e+01 2.870e+01 5.046e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-13 04:13:09,022 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 04:13:09,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1984390.0, ans=0.2 2024-08-13 04:13:11,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10050, loss[loss=0.1126, beats_loss=0.01105, ecapa_loss=0.0001492, whisper_loss=0.1001, over 23893.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001683, whisper_loss=0.09162, over 3879765.85 frames. ], batch size: 94, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:13:13,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1984490.0, ans=0.0 2024-08-13 04:13:25,540 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-13 04:13:59,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1984790.0, ans=0.125 2024-08-13 04:14:08,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1984890.0, ans=0.125 2024-08-13 04:14:11,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2024-08-13 04:14:12,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1984890.0, ans=0.125 2024-08-13 04:14:20,848 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10100, loss[loss=0.1097, beats_loss=0.01162, ecapa_loss=0.0001876, whisper_loss=0.09623, over 21807.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001681, whisper_loss=0.09202, over 3908278.46 frames. ], batch size: 90, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:14:38,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1985090.0, ans=0.0 2024-08-13 04:14:43,030 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 04:14:44,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1985090.0, ans=0.2 2024-08-13 04:14:58,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1985190.0, ans=0.5 2024-08-13 04:15:01,119 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 04:15:10,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.445e+01 2.629e+01 3.089e+01 3.463e+02, threshold=5.257e+01, percent-clipped=1.0 2024-08-13 04:15:12,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1985290.0, ans=0.0 2024-08-13 04:15:29,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10150, loss[loss=0.1123, beats_loss=0.01102, ecapa_loss=0.0001527, whisper_loss=0.09979, over 23365.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.000168, whisper_loss=0.09203, over 3894277.07 frames. ], batch size: 94, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:15:35,096 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 04:15:38,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1985490.0, ans=0.125 2024-08-13 04:15:48,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-13 04:16:06,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1985690.0, ans=0.0 2024-08-13 04:16:38,156 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10200, loss[loss=0.1059, beats_loss=0.01103, ecapa_loss=0.0001567, whisper_loss=0.09329, over 22476.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.0001671, whisper_loss=0.09161, over 3876585.82 frames. ], batch size: 91, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:16:46,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1985990.0, ans=0.0 2024-08-13 04:16:51,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1986090.0, ans=0.125 2024-08-13 04:16:52,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1986090.0, ans=0.0 2024-08-13 04:16:55,405 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 04:16:59,285 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 04:17:11,869 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 04:17:17,306 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 04:17:19,892 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-13 04:17:20,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1986290.0, ans=0.0 2024-08-13 04:17:21,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1986290.0, ans=0.125 2024-08-13 04:17:27,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.440e+01 2.685e+01 3.230e+01 3.990e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 04:17:28,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1986290.0, ans=0.1 2024-08-13 04:17:45,912 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 04:17:46,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10250, loss[loss=0.09238, beats_loss=0.01297, ecapa_loss=0.0001683, whisper_loss=0.07773, over 18237.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.0001647, whisper_loss=0.09145, over 3907167.53 frames. ], batch size: 77, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:18:03,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2024-08-13 04:18:33,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-13 04:18:48,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-08-13 04:18:55,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10300, loss[loss=0.07568, beats_loss=0.01269, ecapa_loss=0.0002053, whisper_loss=0.06093, over 14776.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001668, whisper_loss=0.09119, over 3912919.19 frames. ], batch size: 62, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:19:07,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1986990.0, ans=0.125 2024-08-13 04:19:32,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1987190.0, ans=0.0 2024-08-13 04:19:38,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-13 04:19:44,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.484e+01 2.743e+01 3.118e+01 4.422e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 04:19:50,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1987390.0, ans=0.0 2024-08-13 04:19:53,910 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 04:19:55,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-13 04:20:03,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10350, loss[loss=0.1087, beats_loss=0.01046, ecapa_loss=0.0001716, whisper_loss=0.09655, over 20825.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01091, ecapa_loss=0.0001668, whisper_loss=0.09157, over 3932590.72 frames. ], batch size: 85, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:20:05,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1987490.0, ans=0.2 2024-08-13 04:20:34,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-08-13 04:20:36,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1987690.0, ans=0.125 2024-08-13 04:20:37,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1987690.0, ans=0.2 2024-08-13 04:20:52,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1987790.0, ans=0.125 2024-08-13 04:21:07,019 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 04:21:11,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10400, loss[loss=0.1193, beats_loss=0.009102, ecapa_loss=0.000182, whisper_loss=0.1084, over 22037.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01087, ecapa_loss=0.0001671, whisper_loss=0.09189, over 3921605.56 frames. ], batch size: 88, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:21:27,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1988090.0, ans=0.125 2024-08-13 04:21:28,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1988090.0, ans=0.0 2024-08-13 04:21:40,290 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 04:21:52,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1988290.0, ans=0.125 2024-08-13 04:21:58,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-13 04:22:01,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.435e+01 2.770e+01 3.094e+01 5.065e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 04:22:11,665 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 04:22:11,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1988390.0, ans=0.125 2024-08-13 04:22:21,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2024-08-13 04:22:21,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10450, loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001431, whisper_loss=0.09291, over 16656.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001674, whisper_loss=0.09145, over 3907562.70 frames. ], batch size: 66, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:22:30,128 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.643e-02 2024-08-13 04:22:31,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-13 04:22:36,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1988590.0, ans=0.1 2024-08-13 04:22:48,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1988690.0, ans=0.125 2024-08-13 04:22:57,325 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 04:23:09,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2024-08-13 04:23:12,490 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 04:23:12,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1988790.0, ans=0.0 2024-08-13 04:23:30,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10500, loss[loss=0.09263, beats_loss=0.01146, ecapa_loss=0.0001533, whisper_loss=0.07964, over 14874.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001683, whisper_loss=0.09132, over 3878822.08 frames. ], batch size: 56, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:23:30,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1988990.0, ans=0.0 2024-08-13 04:23:32,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1988990.0, ans=0.025 2024-08-13 04:23:41,348 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 04:23:59,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-08-13 04:24:05,623 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 16 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-13 04:24:21,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.377e+01 2.646e+01 2.972e+01 5.578e+01, threshold=5.291e+01, percent-clipped=1.0 2024-08-13 04:24:26,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1989390.0, ans=0.125 2024-08-13 04:24:43,054 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10550, loss[loss=0.1272, beats_loss=0.009434, ecapa_loss=0.0001614, whisper_loss=0.1161, over 21036.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01096, ecapa_loss=0.0001683, whisper_loss=0.09059, over 3883524.62 frames. ], batch size: 80, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:24:47,939 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 04:24:50,723 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 04:24:52,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1989490.0, ans=0.1 2024-08-13 04:24:53,670 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 04:24:58,054 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 16 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 04:25:07,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1989590.0, ans=15.0 2024-08-13 04:25:26,682 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-13 04:25:28,502 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 04:25:29,195 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.068e+02 2024-08-13 04:25:40,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-08-13 04:25:47,351 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 04:26:00,242 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10600, loss[loss=0.1108, beats_loss=0.01162, ecapa_loss=0.00013, whisper_loss=0.09793, over 22967.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0111, ecapa_loss=0.0001675, whisper_loss=0.08965, over 3874243.80 frames. ], batch size: 88, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:26:00,468 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 04:26:02,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1989990.0, ans=0.125 2024-08-13 04:26:19,847 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 04:26:26,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1990090.0, ans=0.0 2024-08-13 04:26:38,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.12 vs. limit=10.0 2024-08-13 04:26:54,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.291e+01 2.645e+01 2.934e+01 5.325e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-13 04:26:57,595 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 04:27:08,466 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 04:27:11,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1990390.0, ans=0.125 2024-08-13 04:27:15,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10650, loss[loss=0.1012, beats_loss=0.008353, ecapa_loss=0.0002198, whisper_loss=0.09061, over 19435.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01101, ecapa_loss=0.000167, whisper_loss=0.09055, over 3875157.54 frames. ], batch size: 80, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:27:16,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1990490.0, ans=0.2 2024-08-13 04:27:18,153 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 04:27:33,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1990590.0, ans=0.025 2024-08-13 04:27:38,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1990590.0, ans=0.0 2024-08-13 04:27:41,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1990590.0, ans=0.125 2024-08-13 04:27:46,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1990690.0, ans=0.1 2024-08-13 04:28:00,271 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:28:10,890 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-13 04:28:17,608 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 04:28:32,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=12.0 2024-08-13 04:28:35,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10700, loss[loss=0.1026, beats_loss=0.009823, ecapa_loss=0.0001438, whisper_loss=0.09129, over 23149.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01095, ecapa_loss=0.0001658, whisper_loss=0.09125, over 3906006.97 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:28:43,615 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 04:28:45,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1990990.0, ans=0.125 2024-08-13 04:29:07,092 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 04:29:09,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1991190.0, ans=0.1 2024-08-13 04:29:10,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1991190.0, ans=0.0 2024-08-13 04:29:13,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1991190.0, ans=0.09899494936611666 2024-08-13 04:29:19,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1991190.0, ans=0.1 2024-08-13 04:29:30,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.433e+01 2.666e+01 3.252e+01 5.472e+01, threshold=5.332e+01, percent-clipped=1.0 2024-08-13 04:29:40,729 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 04:29:45,339 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 04:29:51,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1991490.0, ans=0.125 2024-08-13 04:29:52,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10750, loss[loss=0.1253, beats_loss=0.009736, ecapa_loss=0.0001301, whisper_loss=0.1143, over 22773.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001678, whisper_loss=0.09178, over 3883501.89 frames. ], batch size: 88, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:29:53,317 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 04:29:55,156 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 04:30:12,484 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 04:30:35,090 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 04:30:41,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1991790.0, ans=0.0 2024-08-13 04:31:09,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1991890.0, ans=0.125 2024-08-13 04:31:13,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10800, loss[loss=0.1153, beats_loss=0.01108, ecapa_loss=0.0001694, whisper_loss=0.1025, over 14808.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01084, ecapa_loss=0.0001683, whisper_loss=0.0926, over 3925207.84 frames. ], batch size: 58, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:31:19,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1991990.0, ans=0.1 2024-08-13 04:31:36,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1992090.0, ans=0.1 2024-08-13 04:31:49,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1992190.0, ans=0.2 2024-08-13 04:31:59,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-13 04:32:00,325 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 04:32:05,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1992290.0, ans=0.2 2024-08-13 04:32:10,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.408e+01 2.896e+01 3.475e+01 4.951e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 04:32:29,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1992390.0, ans=0.0 2024-08-13 04:32:32,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1992490.0, ans=0.0 2024-08-13 04:32:32,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10850, loss[loss=0.09239, beats_loss=0.0122, ecapa_loss=0.0001929, whisper_loss=0.07827, over 21620.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001677, whisper_loss=0.09196, over 3912029.44 frames. ], batch size: 92, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:32:36,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1992490.0, ans=0.125 2024-08-13 04:32:43,660 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 04:32:48,956 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 04:33:13,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1992690.0, ans=0.035 2024-08-13 04:33:13,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.12 vs. limit=10.0 2024-08-13 04:33:19,154 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 04:33:22,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.24 vs. limit=10.0 2024-08-13 04:33:23,724 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 04:33:25,425 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 04:33:28,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-08-13 04:33:49,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1992890.0, ans=10.0 2024-08-13 04:33:51,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10900, loss[loss=0.1074, beats_loss=0.01114, ecapa_loss=0.0001574, whisper_loss=0.09471, over 22158.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001676, whisper_loss=0.09163, over 3936303.76 frames. ], batch size: 90, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:34:10,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1993090.0, ans=0.125 2024-08-13 04:34:30,301 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.271e-01 2024-08-13 04:34:31,506 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 16 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-13 04:34:33,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1993190.0, ans=0.0 2024-08-13 04:34:45,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1993290.0, ans=0.0 2024-08-13 04:34:51,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-13 04:34:51,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.538e+01 2.794e+01 3.172e+01 4.370e+01, threshold=5.589e+01, percent-clipped=0.0 2024-08-13 04:35:12,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 10950, loss[loss=0.1169, beats_loss=0.00907, ecapa_loss=0.0002053, whisper_loss=0.1058, over 15842.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001678, whisper_loss=0.09182, over 3928061.10 frames. ], batch size: 66, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:35:13,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-08-13 04:35:16,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1993490.0, ans=0.0 2024-08-13 04:35:42,115 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 04:36:06,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1993790.0, ans=0.0 2024-08-13 04:36:08,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1993790.0, ans=0.0 2024-08-13 04:36:10,898 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-13 04:36:29,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-08-13 04:36:33,414 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11000, loss[loss=0.102, beats_loss=0.01103, ecapa_loss=0.0001831, whisper_loss=0.08916, over 18211.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.000168, whisper_loss=0.09187, over 3939354.87 frames. ], batch size: 75, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:36:43,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.76 vs. limit=10.0 2024-08-13 04:36:43,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1993990.0, ans=0.125 2024-08-13 04:36:52,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1994090.0, ans=0.0 2024-08-13 04:36:57,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.47 vs. limit=10.0 2024-08-13 04:37:18,344 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 04:37:29,293 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 04:37:33,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.385e+01 2.603e+01 2.980e+01 9.171e+01, threshold=5.207e+01, percent-clipped=2.0 2024-08-13 04:37:37,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1994390.0, ans=0.125 2024-08-13 04:37:54,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11050, loss[loss=0.09574, beats_loss=0.01059, ecapa_loss=0.0001959, whisper_loss=0.08319, over 19127.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01087, ecapa_loss=0.0001679, whisper_loss=0.09183, over 3948548.84 frames. ], batch size: 83, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:38:10,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1994590.0, ans=0.1 2024-08-13 04:38:12,830 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 04:38:13,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2024-08-13 04:38:19,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1994590.0, ans=0.125 2024-08-13 04:38:34,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1994690.0, ans=0.0 2024-08-13 04:38:50,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1994790.0, ans=0.0 2024-08-13 04:38:59,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1994890.0, ans=0.0 2024-08-13 04:39:18,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11100, loss[loss=0.09048, beats_loss=0.01214, ecapa_loss=0.0001771, whisper_loss=0.07657, over 21315.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01082, ecapa_loss=0.0001684, whisper_loss=0.09186, over 3941670.51 frames. ], batch size: 91, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:39:48,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=12.0 2024-08-13 04:39:49,200 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 04:40:01,792 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 04:40:23,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.346e+01 2.633e+01 2.953e+01 4.555e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 04:40:36,866 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 04:40:44,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1995390.0, ans=0.5 2024-08-13 04:40:52,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11150, loss[loss=0.08726, beats_loss=0.01092, ecapa_loss=0.0001154, whisper_loss=0.07519, over 14539.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001674, whisper_loss=0.09121, over 3904942.82 frames. ], batch size: 54, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:40:55,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-13 04:41:11,552 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 04:41:36,109 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 04:41:49,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1995690.0, ans=0.1 2024-08-13 04:41:53,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1995690.0, ans=0.125 2024-08-13 04:42:22,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1995890.0, ans=0.0 2024-08-13 04:42:26,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-08-13 04:42:39,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1995990.0, ans=0.125 2024-08-13 04:42:40,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11200, loss[loss=0.08274, beats_loss=0.01063, ecapa_loss=0.0002084, whisper_loss=0.07002, over 17383.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001675, whisper_loss=0.09173, over 3893560.98 frames. ], batch size: 73, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:42:40,884 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-13 04:42:41,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1995990.0, ans=0.125 2024-08-13 04:43:17,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1996090.0, ans=0.1 2024-08-13 04:43:20,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1996090.0, ans=0.0 2024-08-13 04:43:32,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1996190.0, ans=0.05 2024-08-13 04:44:12,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.527e+01 2.790e+01 3.048e+01 4.600e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 04:44:15,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1996290.0, ans=0.04949747468305833 2024-08-13 04:44:47,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11250, loss[loss=0.1177, beats_loss=0.009879, ecapa_loss=0.0001622, whisper_loss=0.1062, over 23006.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001667, whisper_loss=0.09155, over 3909393.49 frames. ], batch size: 91, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:45:02,972 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 34 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:45:05,781 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-13 04:45:14,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1996590.0, ans=0.1 2024-08-13 04:45:16,714 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 04:45:26,880 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 04:46:32,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1996890.0, ans=0.125 2024-08-13 04:46:41,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1996890.0, ans=0.0 2024-08-13 04:46:52,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11300, loss[loss=0.09284, beats_loss=0.01161, ecapa_loss=0.0001067, whisper_loss=0.08016, over 16275.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.000165, whisper_loss=0.09157, over 3886689.73 frames. ], batch size: 60, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:47:27,255 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 04:47:27,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1997090.0, ans=0.2 2024-08-13 04:47:33,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1997090.0, ans=15.0 2024-08-13 04:47:53,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1997190.0, ans=0.2 2024-08-13 04:48:14,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1997290.0, ans=0.0 2024-08-13 04:48:27,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.451e+01 2.765e+01 3.179e+01 5.185e+01, threshold=5.530e+01, percent-clipped=0.0 2024-08-13 04:48:39,938 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 04:48:45,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1997390.0, ans=0.2 2024-08-13 04:48:48,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.77 vs. limit=5.0 2024-08-13 04:48:59,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11350, loss[loss=0.126, beats_loss=0.009602, ecapa_loss=0.0001879, whisper_loss=0.1146, over 22894.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001658, whisper_loss=0.092, over 3913831.89 frames. ], batch size: 90, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:49:05,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1997490.0, ans=0.2 2024-08-13 04:49:07,255 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 04:49:07,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1997490.0, ans=0.125 2024-08-13 04:49:32,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1997590.0, ans=0.125 2024-08-13 04:49:39,899 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 17 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 04:49:45,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1997690.0, ans=0.125 2024-08-13 04:49:46,839 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 04:49:58,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2024-08-13 04:50:11,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1997890.0, ans=0.125 2024-08-13 04:50:21,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1997890.0, ans=0.125 2024-08-13 04:50:29,809 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11400, loss[loss=0.1089, beats_loss=0.008539, ecapa_loss=0.0001819, whisper_loss=0.09859, over 14330.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01088, ecapa_loss=0.0001658, whisper_loss=0.09201, over 3901919.66 frames. ], batch size: 56, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:50:39,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1997990.0, ans=0.0 2024-08-13 04:50:41,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1997990.0, ans=0.125 2024-08-13 04:50:56,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1998090.0, ans=0.1 2024-08-13 04:51:10,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1998190.0, ans=0.2 2024-08-13 04:51:39,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.469e+01 2.790e+01 3.072e+01 4.491e+01, threshold=5.580e+01, percent-clipped=0.0 2024-08-13 04:51:48,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1998390.0, ans=0.09899494936611666 2024-08-13 04:52:03,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11450, loss[loss=0.09616, beats_loss=0.01009, ecapa_loss=0.0001472, whisper_loss=0.0846, over 16787.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01082, ecapa_loss=0.0001656, whisper_loss=0.09296, over 3935366.12 frames. ], batch size: 61, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:52:08,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1998490.0, ans=0.1 2024-08-13 04:52:16,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2024-08-13 04:52:20,539 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 04:52:23,753 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 04:52:31,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1998590.0, ans=0.125 2024-08-13 04:52:34,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-08-13 04:52:50,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1998690.0, ans=0.125 2024-08-13 04:52:50,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1998690.0, ans=0.2 2024-08-13 04:53:04,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1998790.0, ans=0.0 2024-08-13 04:53:11,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1998790.0, ans=0.0 2024-08-13 04:53:16,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1998890.0, ans=0.0 2024-08-13 04:53:36,242 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 04:53:38,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11500, loss[loss=0.1, beats_loss=0.01226, ecapa_loss=0.0001454, whisper_loss=0.08633, over 23121.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001644, whisper_loss=0.09237, over 3915120.53 frames. ], batch size: 94, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:53:44,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2024-08-13 04:53:58,260 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 04:53:59,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1999090.0, ans=0.0 2024-08-13 04:54:24,505 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 04:54:39,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1999290.0, ans=0.1 2024-08-13 04:54:45,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.530e+01 2.837e+01 3.156e+01 6.576e+01, threshold=5.675e+01, percent-clipped=1.0 2024-08-13 04:54:47,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1999290.0, ans=0.125 2024-08-13 04:54:53,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1999390.0, ans=0.0 2024-08-13 04:54:53,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-13 04:54:59,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1999390.0, ans=0.0 2024-08-13 04:55:07,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11550, loss[loss=0.1169, beats_loss=0.01076, ecapa_loss=0.0001685, whisper_loss=0.1044, over 18659.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001651, whisper_loss=0.09213, over 3897366.10 frames. ], batch size: 71, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:55:15,221 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 04:55:16,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1999490.0, ans=0.125 2024-08-13 04:55:23,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1999490.0, ans=0.0 2024-08-13 04:55:42,850 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 04:56:04,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1999790.0, ans=15.0 2024-08-13 04:56:31,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1999890.0, ans=0.1 2024-08-13 04:56:31,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1999890.0, ans=0.0 2024-08-13 04:56:33,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1999890.0, ans=0.125 2024-08-13 04:56:40,269 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11600, loss[loss=0.08455, beats_loss=0.01379, ecapa_loss=0.0001372, whisper_loss=0.06939, over 14316.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01087, ecapa_loss=0.0001657, whisper_loss=0.09223, over 3898523.40 frames. ], batch size: 57, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:57:04,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2000090.0, ans=0.2 2024-08-13 04:57:05,922 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 04:57:24,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2000190.0, ans=0.2 2024-08-13 04:57:38,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2000190.0, ans=0.125 2024-08-13 04:57:56,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.423e+01 2.636e+01 2.832e+01 7.836e+01, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 04:58:22,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11650, loss[loss=0.08854, beats_loss=0.01245, ecapa_loss=0.0001774, whisper_loss=0.07432, over 17774.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001655, whisper_loss=0.09265, over 3899281.47 frames. ], batch size: 69, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:58:40,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2000590.0, ans=0.1 2024-08-13 04:58:41,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2000590.0, ans=0.125 2024-08-13 04:58:54,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2000590.0, ans=0.2 2024-08-13 04:58:59,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2000690.0, ans=0.0 2024-08-13 04:59:26,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2000790.0, ans=0.0 2024-08-13 04:59:37,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=12.0 2024-08-13 04:59:54,224 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 04:59:56,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11700, loss[loss=0.08944, beats_loss=0.01168, ecapa_loss=0.000197, whisper_loss=0.07579, over 19622.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0001658, whisper_loss=0.09218, over 3919894.39 frames. ], batch size: 87, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:00:05,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-13 05:00:13,169 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-13 05:00:21,444 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 05:00:29,282 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-13 05:00:54,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2001290.0, ans=0.1 2024-08-13 05:01:07,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.358e+01 2.707e+01 3.132e+01 5.516e+01, threshold=5.414e+01, percent-clipped=1.0 2024-08-13 05:01:18,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2001390.0, ans=0.07 2024-08-13 05:01:26,695 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 05:01:30,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11750, loss[loss=0.06418, beats_loss=0.01354, ecapa_loss=0.000127, whisper_loss=0.04937, over 16702.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001654, whisper_loss=0.09239, over 3917814.43 frames. ], batch size: 67, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:01:31,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2001490.0, ans=0.125 2024-08-13 05:01:39,688 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 05:01:48,981 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 05:01:54,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2001590.0, ans=0.125 2024-08-13 05:02:14,936 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 05:02:17,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2001690.0, ans=0.125 2024-08-13 05:02:40,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2001790.0, ans=0.125 2024-08-13 05:02:42,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2001790.0, ans=0.0 2024-08-13 05:02:46,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2024-08-13 05:02:49,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2001890.0, ans=0.0 2024-08-13 05:02:54,356 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 05:03:03,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11800, loss[loss=0.1001, beats_loss=0.01212, ecapa_loss=0.0001459, whisper_loss=0.08656, over 14464.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01102, ecapa_loss=0.0001648, whisper_loss=0.09247, over 3935453.79 frames. ], batch size: 55, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:03:03,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2001990.0, ans=0.125 2024-08-13 05:03:09,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2001990.0, ans=0.125 2024-08-13 05:03:36,262 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 05:04:06,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.539e+01 2.830e+01 3.148e+01 9.366e+01, threshold=5.659e+01, percent-clipped=1.0 2024-08-13 05:04:11,850 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 05:04:15,682 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 05:04:18,528 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 05:04:20,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2002390.0, ans=0.1 2024-08-13 05:04:24,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2002390.0, ans=0.0 2024-08-13 05:04:29,033 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11850, loss[loss=0.1036, beats_loss=0.01078, ecapa_loss=0.0001776, whisper_loss=0.09108, over 20569.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0001646, whisper_loss=0.09228, over 3934785.30 frames. ], batch size: 84, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:04:43,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2002490.0, ans=0.0 2024-08-13 05:04:47,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2002590.0, ans=0.1 2024-08-13 05:05:02,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-08-13 05:05:24,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2002790.0, ans=0.0 2024-08-13 05:05:41,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2002890.0, ans=0.125 2024-08-13 05:05:46,473 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 14 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 05:05:53,460 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 05:05:57,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11900, loss[loss=0.09824, beats_loss=0.01198, ecapa_loss=0.0001997, whisper_loss=0.08426, over 20265.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001669, whisper_loss=0.09224, over 3951025.63 frames. ], batch size: 88, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:06:01,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2024-08-13 05:06:08,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2002990.0, ans=0.0 2024-08-13 05:06:11,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2002990.0, ans=0.125 2024-08-13 05:06:40,679 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 05:06:41,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2003190.0, ans=0.0 2024-08-13 05:06:48,337 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 05:07:00,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.520e+01 2.675e+01 3.005e+01 5.998e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 05:07:16,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-08-13 05:07:22,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2003490.0, ans=0.125 2024-08-13 05:07:23,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 11950, loss[loss=0.1066, beats_loss=0.01082, ecapa_loss=0.0001739, whisper_loss=0.09404, over 15967.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001674, whisper_loss=0.09166, over 3897417.33 frames. ], batch size: 62, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:07:49,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2003590.0, ans=0.125 2024-08-13 05:07:57,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2003690.0, ans=0.125 2024-08-13 05:08:32,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2003890.0, ans=10.0 2024-08-13 05:08:49,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12000, loss[loss=0.09157, beats_loss=0.01335, ecapa_loss=0.000148, whisper_loss=0.07674, over 22325.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.0001678, whisper_loss=0.09139, over 3887144.70 frames. ], batch size: 90, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:08:49,872 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 05:09:29,216 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005731, whisper_loss=0.2468, over 922467.00 frames. 2024-08-13 05:09:48,279 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on SV_voxceleb1: loss=0.004602, beats_loss=0, ecapa_loss=0.0004602, whisper_loss=0, over 939242.00 frames. 2024-08-13 05:11:41,035 INFO [train_multi_KD3.py:1149] (3/4) Epoch 14, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 05:11:41,039 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 05:11:45,510 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 05:12:07,785 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 05:12:15,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2004190.0, ans=0.125 2024-08-13 05:12:20,265 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 05:12:20,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2024-08-13 05:12:23,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2004190.0, ans=0.0 2024-08-13 05:12:30,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2004290.0, ans=0.2 2024-08-13 05:12:30,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2004290.0, ans=0.2 2024-08-13 05:12:32,064 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 05:12:32,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.09 vs. limit=10.0 2024-08-13 05:12:43,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.475e+01 2.665e+01 3.111e+01 1.048e+02, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 05:12:44,041 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.282e+01 2024-08-13 05:13:02,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2004390.0, ans=0.0 2024-08-13 05:13:04,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12050, loss[loss=0.09071, beats_loss=0.01398, ecapa_loss=0.0001382, whisper_loss=0.07535, over 17812.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001673, whisper_loss=0.09145, over 3865254.10 frames. ], batch size: 73, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:13:05,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2004490.0, ans=0.125 2024-08-13 05:13:09,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2004490.0, ans=0.125 2024-08-13 05:13:11,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2004490.0, ans=0.1 2024-08-13 05:13:57,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-13 05:14:26,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2004890.0, ans=0.1 2024-08-13 05:14:28,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12100, loss[loss=0.1079, beats_loss=0.009862, ecapa_loss=0.0002023, whisper_loss=0.09602, over 16331.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001683, whisper_loss=0.09127, over 3866617.69 frames. ], batch size: 70, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:14:30,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2004990.0, ans=0.1 2024-08-13 05:14:40,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2004990.0, ans=0.1 2024-08-13 05:14:47,548 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 05:15:14,417 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-13 05:15:16,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2024-08-13 05:15:27,716 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 05:15:31,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.461e+01 2.696e+01 3.254e+01 5.243e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-13 05:15:39,708 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-13 05:15:42,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=12.0 2024-08-13 05:15:46,876 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 05:15:52,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12150, loss[loss=0.1127, beats_loss=0.01032, ecapa_loss=0.0001838, whisper_loss=0.1005, over 22199.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001677, whisper_loss=0.09098, over 3877190.18 frames. ], batch size: 89, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:15:55,311 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 05:15:55,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2005490.0, ans=0.125 2024-08-13 05:15:57,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2024-08-13 05:16:01,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2005490.0, ans=0.125 2024-08-13 05:16:02,948 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 05:16:18,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2005590.0, ans=0.125 2024-08-13 05:16:53,522 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.166e-01 2024-08-13 05:17:14,029 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 05:17:17,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12200, loss[loss=0.1207, beats_loss=0.008745, ecapa_loss=0.0001849, whisper_loss=0.1101, over 22340.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.000167, whisper_loss=0.09117, over 3846274.62 frames. ], batch size: 92, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:17:26,514 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 05:17:31,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2005990.0, ans=0.125 2024-08-13 05:17:57,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2006190.0, ans=0.2 2024-08-13 05:18:06,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2006190.0, ans=0.2 2024-08-13 05:18:21,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.467e+01 2.824e+01 3.197e+01 4.821e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 05:18:30,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2006390.0, ans=0.125 2024-08-13 05:18:42,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12250, loss[loss=0.1052, beats_loss=0.012, ecapa_loss=0.0001937, whisper_loss=0.09131, over 21002.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001685, whisper_loss=0.0921, over 3855199.19 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:18:45,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2006490.0, ans=0.0 2024-08-13 05:18:45,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2006490.0, ans=15.0 2024-08-13 05:18:53,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2006490.0, ans=0.125 2024-08-13 05:19:08,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2006590.0, ans=0.0 2024-08-13 05:19:42,677 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 05:19:44,493 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 05:20:04,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12300, loss[loss=0.1226, beats_loss=0.009311, ecapa_loss=0.0001599, whisper_loss=0.1116, over 14570.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001683, whisper_loss=0.09158, over 3828147.83 frames. ], batch size: 55, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:20:05,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-08-13 05:20:16,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2006990.0, ans=0.125 2024-08-13 05:20:24,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2007090.0, ans=0.2 2024-08-13 05:20:30,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2007090.0, ans=0.125 2024-08-13 05:20:37,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2007190.0, ans=0.1 2024-08-13 05:20:39,826 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 05:20:42,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2007190.0, ans=0.0 2024-08-13 05:20:54,769 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-13 05:20:55,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2024-08-13 05:21:01,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2007290.0, ans=0.125 2024-08-13 05:21:06,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.461e+01 2.771e+01 3.048e+01 4.529e+01, threshold=5.542e+01, percent-clipped=0.0 2024-08-13 05:21:30,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12350, loss[loss=0.09024, beats_loss=0.01346, ecapa_loss=0.0001605, whisper_loss=0.07518, over 19334.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01072, ecapa_loss=0.0001693, whisper_loss=0.09173, over 3826876.35 frames. ], batch size: 78, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:21:36,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2007490.0, ans=0.0 2024-08-13 05:21:40,626 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 05:22:09,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2007690.0, ans=0.0 2024-08-13 05:22:09,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2007690.0, ans=0.2 2024-08-13 05:22:15,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2007690.0, ans=0.125 2024-08-13 05:22:24,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2007790.0, ans=0.0 2024-08-13 05:22:25,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2007790.0, ans=0.07 2024-08-13 05:22:42,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2007890.0, ans=0.125 2024-08-13 05:22:55,514 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12400, loss[loss=0.1279, beats_loss=0.01209, ecapa_loss=0.0001512, whisper_loss=0.1143, over 23030.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01075, ecapa_loss=0.0001688, whisper_loss=0.09211, over 3874992.38 frames. ], batch size: 92, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:22:57,818 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 05:23:00,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2007990.0, ans=0.125 2024-08-13 05:23:00,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2007990.0, ans=0.1 2024-08-13 05:23:01,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2007990.0, ans=0.0 2024-08-13 05:23:03,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=12.0 2024-08-13 05:23:13,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2008090.0, ans=0.125 2024-08-13 05:23:15,207 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.336e-01 2024-08-13 05:23:25,525 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:23:25,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2008090.0, ans=0.125 2024-08-13 05:23:29,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2008190.0, ans=0.125 2024-08-13 05:23:34,172 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:23:41,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-13 05:23:46,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2008290.0, ans=0.125 2024-08-13 05:23:59,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.499e+01 2.802e+01 3.094e+01 1.002e+02, threshold=5.604e+01, percent-clipped=2.0 2024-08-13 05:24:05,265 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 05:24:07,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2008390.0, ans=0.1 2024-08-13 05:24:14,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2024-08-13 05:24:22,306 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12450, loss[loss=0.0895, beats_loss=0.01171, ecapa_loss=0.0001846, whisper_loss=0.07594, over 18929.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001691, whisper_loss=0.09143, over 3867461.56 frames. ], batch size: 77, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:24:26,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2008490.0, ans=0.2 2024-08-13 05:24:33,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2008490.0, ans=0.05 2024-08-13 05:24:38,036 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 05:24:46,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2008590.0, ans=0.125 2024-08-13 05:24:54,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2008590.0, ans=0.125 2024-08-13 05:25:02,261 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 05:25:02,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2008690.0, ans=0.0 2024-08-13 05:25:08,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2008690.0, ans=0.1 2024-08-13 05:25:36,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2008890.0, ans=0.125 2024-08-13 05:25:50,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12500, loss[loss=0.09965, beats_loss=0.01275, ecapa_loss=0.0001407, whisper_loss=0.08549, over 15899.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001684, whisper_loss=0.09166, over 3892567.96 frames. ], batch size: 63, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:26:00,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2008990.0, ans=0.125 2024-08-13 05:26:03,324 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 05:26:12,642 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 05:26:22,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2009190.0, ans=0.0 2024-08-13 05:26:28,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-13 05:26:37,094 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 05:26:46,146 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 05:26:51,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.389e+01 2.676e+01 3.149e+01 9.586e+01, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 05:27:08,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-13 05:27:11,785 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 05:27:14,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-08-13 05:27:14,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12550, loss[loss=0.09477, beats_loss=0.01241, ecapa_loss=0.0001377, whisper_loss=0.08098, over 15717.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001688, whisper_loss=0.0917, over 3914395.31 frames. ], batch size: 61, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:27:20,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2009490.0, ans=0.125 2024-08-13 05:27:22,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2009490.0, ans=0.2 2024-08-13 05:27:26,707 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 05:27:52,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2009690.0, ans=0.125 2024-08-13 05:28:03,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2009790.0, ans=0.05 2024-08-13 05:28:24,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2009890.0, ans=0.0 2024-08-13 05:28:25,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2009890.0, ans=0.1 2024-08-13 05:28:35,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12600, loss[loss=0.1184, beats_loss=0.01143, ecapa_loss=0.0001511, whisper_loss=0.1055, over 20385.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001679, whisper_loss=0.09166, over 3917714.98 frames. ], batch size: 81, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:29:00,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2010090.0, ans=0.0 2024-08-13 05:29:17,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2010190.0, ans=0.2 2024-08-13 05:29:36,027 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.351e+01 2.664e+01 2.979e+01 4.679e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 05:29:37,074 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 05:29:49,580 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 05:29:53,661 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 05:29:57,424 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12650, loss[loss=0.1088, beats_loss=0.01213, ecapa_loss=0.0001345, whisper_loss=0.09528, over 23125.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001675, whisper_loss=0.09166, over 3914350.56 frames. ], batch size: 91, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:30:17,662 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 05:30:36,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-08-13 05:30:37,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2010690.0, ans=0.125 2024-08-13 05:30:47,290 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 05:30:53,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2010790.0, ans=0.125 2024-08-13 05:30:54,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2010790.0, ans=0.0 2024-08-13 05:30:55,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.91 vs. limit=10.0 2024-08-13 05:31:11,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2010890.0, ans=0.1 2024-08-13 05:31:20,451 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 05:31:21,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12700, loss[loss=0.1099, beats_loss=0.01105, ecapa_loss=0.0001721, whisper_loss=0.09714, over 20697.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01094, ecapa_loss=0.0001686, whisper_loss=0.09183, over 3936089.47 frames. ], batch size: 82, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:31:23,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2010990.0, ans=0.0 2024-08-13 05:31:38,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2011090.0, ans=0.125 2024-08-13 05:31:46,306 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 05:31:53,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2011190.0, ans=0.125 2024-08-13 05:32:21,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.465e+01 2.775e+01 3.008e+01 5.404e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 05:32:23,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2011290.0, ans=0.125 2024-08-13 05:32:39,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2011390.0, ans=0.125 2024-08-13 05:32:42,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12750, loss[loss=0.07453, beats_loss=0.01363, ecapa_loss=0.0001818, whisper_loss=0.05908, over 20499.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001684, whisper_loss=0.09196, over 3912012.80 frames. ], batch size: 87, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:32:55,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2011490.0, ans=0.07 2024-08-13 05:33:14,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2011690.0, ans=0.125 2024-08-13 05:33:27,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=2011690.0, ans=15.0 2024-08-13 05:33:38,995 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 05:33:53,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-13 05:33:54,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2011890.0, ans=0.02 2024-08-13 05:33:56,283 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 05:34:03,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12800, loss[loss=0.1184, beats_loss=0.01023, ecapa_loss=0.0001575, whisper_loss=0.1066, over 18762.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001691, whisper_loss=0.09236, over 3927164.55 frames. ], batch size: 73, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:34:13,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2011990.0, ans=10.0 2024-08-13 05:34:30,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2012090.0, ans=0.2 2024-08-13 05:35:05,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.426e+01 2.719e+01 3.089e+01 6.356e+01, threshold=5.438e+01, percent-clipped=2.0 2024-08-13 05:35:27,208 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12850, loss[loss=0.1178, beats_loss=0.009614, ecapa_loss=0.0001762, whisper_loss=0.1064, over 22680.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01083, ecapa_loss=0.0001692, whisper_loss=0.09274, over 3911451.84 frames. ], batch size: 89, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:35:39,112 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 05:35:41,027 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 05:35:44,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2012590.0, ans=0.125 2024-08-13 05:35:48,646 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 05:35:52,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2012590.0, ans=0.125 2024-08-13 05:35:58,059 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-13 05:35:59,813 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 05:36:06,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2012690.0, ans=0.125 2024-08-13 05:36:06,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2012690.0, ans=0.125 2024-08-13 05:36:12,265 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 05:36:16,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2012790.0, ans=0.125 2024-08-13 05:36:47,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12900, loss[loss=0.09558, beats_loss=0.01368, ecapa_loss=0.0001311, whisper_loss=0.08059, over 22648.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001687, whisper_loss=0.09206, over 3895053.55 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:37:04,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2013090.0, ans=0.2 2024-08-13 05:37:16,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-13 05:37:25,637 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 05:37:26,981 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 05:37:32,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2024-08-13 05:37:34,760 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 05:37:37,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2013290.0, ans=0.125 2024-08-13 05:37:39,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2013290.0, ans=0.1 2024-08-13 05:37:41,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=2013290.0, ans=0.02 2024-08-13 05:37:44,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.357e+01 2.603e+01 2.918e+01 4.145e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-13 05:37:47,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2013290.0, ans=0.125 2024-08-13 05:37:49,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2024-08-13 05:37:55,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-13 05:38:07,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 12950, loss[loss=0.09525, beats_loss=0.01054, ecapa_loss=0.0001708, whisper_loss=0.08301, over 19923.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01084, ecapa_loss=0.0001687, whisper_loss=0.09226, over 3907721.28 frames. ], batch size: 79, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:38:07,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-08-13 05:38:08,579 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 05:38:32,545 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:38:35,057 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 05:38:40,318 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 05:38:46,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2013690.0, ans=0.2 2024-08-13 05:38:51,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2013690.0, ans=0.125 2024-08-13 05:38:56,161 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 05:39:17,472 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 24 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-13 05:39:22,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2013890.0, ans=0.125 2024-08-13 05:39:30,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13000, loss[loss=0.09599, beats_loss=0.0129, ecapa_loss=0.000117, whisper_loss=0.08191, over 16560.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001683, whisper_loss=0.09175, over 3897295.19 frames. ], batch size: 63, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:39:43,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2013990.0, ans=0.95 2024-08-13 05:39:49,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=12.0 2024-08-13 05:39:54,609 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 05:40:01,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2014090.0, ans=0.125 2024-08-13 05:40:12,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2014190.0, ans=0.125 2024-08-13 05:40:31,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-13 05:40:31,564 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.457e+01 2.798e+01 3.261e+01 6.703e+01, threshold=5.596e+01, percent-clipped=3.0 2024-08-13 05:40:39,400 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 05:40:49,052 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 05:40:51,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2014490.0, ans=0.2 2024-08-13 05:40:52,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13050, loss[loss=0.1003, beats_loss=0.012, ecapa_loss=0.0001766, whisper_loss=0.08651, over 19710.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01093, ecapa_loss=0.0001684, whisper_loss=0.09174, over 3868544.12 frames. ], batch size: 81, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:41:00,756 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 05:41:10,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2014590.0, ans=0.0 2024-08-13 05:41:20,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=15.0 2024-08-13 05:41:33,132 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 05:42:12,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13100, loss[loss=0.09106, beats_loss=0.01289, ecapa_loss=0.0002084, whisper_loss=0.07609, over 18953.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001693, whisper_loss=0.0915, over 3843432.69 frames. ], batch size: 81, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:42:15,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-08-13 05:42:27,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2015090.0, ans=0.125 2024-08-13 05:42:29,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2015090.0, ans=0.125 2024-08-13 05:42:40,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2015090.0, ans=0.2 2024-08-13 05:43:12,719 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.412e+01 2.747e+01 3.007e+01 5.883e+01, threshold=5.493e+01, percent-clipped=1.0 2024-08-13 05:43:31,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2015390.0, ans=0.95 2024-08-13 05:43:33,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13150, loss[loss=0.1087, beats_loss=0.01001, ecapa_loss=0.0001693, whisper_loss=0.09701, over 22141.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.0001692, whisper_loss=0.09205, over 3847824.10 frames. ], batch size: 90, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:43:41,365 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 05:44:06,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2015690.0, ans=0.125 2024-08-13 05:44:08,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2015690.0, ans=0.015 2024-08-13 05:44:14,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2015690.0, ans=0.125 2024-08-13 05:44:22,341 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.577e+01 2024-08-13 05:44:28,054 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 05:44:53,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13200, loss[loss=0.1283, beats_loss=0.007679, ecapa_loss=0.0001654, whisper_loss=0.119, over 18916.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001693, whisper_loss=0.09163, over 3832936.86 frames. ], batch size: 72, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:45:11,746 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 05:45:53,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.423e+01 2.725e+01 2.981e+01 4.895e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 05:46:03,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2016390.0, ans=0.1 2024-08-13 05:46:14,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13250, loss[loss=0.1056, beats_loss=0.01135, ecapa_loss=0.0001574, whisper_loss=0.09269, over 15930.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001691, whisper_loss=0.09149, over 3817912.64 frames. ], batch size: 64, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:46:35,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2016590.0, ans=0.2 2024-08-13 05:46:54,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2024-08-13 05:47:00,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2016690.0, ans=0.125 2024-08-13 05:47:05,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2016790.0, ans=0.05 2024-08-13 05:47:05,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2016790.0, ans=0.0 2024-08-13 05:47:22,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=12.0 2024-08-13 05:47:40,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2016990.0, ans=0.0 2024-08-13 05:47:41,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13300, loss[loss=0.1028, beats_loss=0.009247, ecapa_loss=0.0001902, whisper_loss=0.09162, over 16472.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001691, whisper_loss=0.09128, over 3823311.61 frames. ], batch size: 67, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:47:41,311 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 05:48:30,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2017290.0, ans=0.0 2024-08-13 05:48:42,651 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.445e+01 2.718e+01 3.162e+01 4.686e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-13 05:48:58,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2017390.0, ans=0.07 2024-08-13 05:48:59,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2017390.0, ans=0.1 2024-08-13 05:48:59,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2017390.0, ans=0.0 2024-08-13 05:49:03,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13350, loss[loss=0.09184, beats_loss=0.007755, ecapa_loss=0.00019, whisper_loss=0.08219, over 16298.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.000168, whisper_loss=0.0912, over 3817244.40 frames. ], batch size: 68, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:49:13,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2017490.0, ans=0.125 2024-08-13 05:49:23,980 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 05:49:50,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-13 05:49:54,101 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 05:49:54,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2017790.0, ans=0.1 2024-08-13 05:49:59,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-13 05:50:26,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13400, loss[loss=0.09823, beats_loss=0.01301, ecapa_loss=0.0001821, whisper_loss=0.0834, over 21841.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001676, whisper_loss=0.09124, over 3825231.68 frames. ], batch size: 93, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:50:33,887 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 05:50:46,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2018090.0, ans=0.125 2024-08-13 05:50:54,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2018090.0, ans=0.2 2024-08-13 05:51:04,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2018190.0, ans=0.05 2024-08-13 05:51:07,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2018190.0, ans=0.125 2024-08-13 05:51:12,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2018190.0, ans=0.0 2024-08-13 05:51:28,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.489e+01 2.760e+01 3.071e+01 5.716e+01, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 05:51:39,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2018390.0, ans=0.1 2024-08-13 05:51:42,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2018390.0, ans=0.95 2024-08-13 05:51:50,197 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13450, loss[loss=0.1051, beats_loss=0.01125, ecapa_loss=0.0002042, whisper_loss=0.09178, over 21782.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001668, whisper_loss=0.09113, over 3831002.65 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:52:05,441 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 05:52:06,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=22.5 2024-08-13 05:52:09,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2018590.0, ans=0.125 2024-08-13 05:52:19,143 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 05:52:30,578 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 05:52:33,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2018690.0, ans=0.0 2024-08-13 05:52:36,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2018690.0, ans=0.0 2024-08-13 05:52:47,562 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 05:52:52,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2024-08-13 05:53:14,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13500, loss[loss=0.09367, beats_loss=0.01281, ecapa_loss=0.0001544, whisper_loss=0.07932, over 22740.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001669, whisper_loss=0.09075, over 3870381.17 frames. ], batch size: 94, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:53:18,067 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 35 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 05:53:27,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2018990.0, ans=0.125 2024-08-13 05:53:30,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2024-08-13 05:53:32,214 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 05:53:32,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2019090.0, ans=0.09899494936611666 2024-08-13 05:53:35,538 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 05:53:41,656 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-13 05:54:08,094 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 05:54:11,148 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 27 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-13 05:54:11,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2019290.0, ans=0.04949747468305833 2024-08-13 05:54:17,719 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.519e+01 2.845e+01 3.228e+01 5.669e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 05:54:19,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2019290.0, ans=0.2 2024-08-13 05:54:22,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2019390.0, ans=0.0 2024-08-13 05:54:23,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=22.5 2024-08-13 05:54:24,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2019390.0, ans=0.0 2024-08-13 05:54:39,052 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13550, loss[loss=0.07638, beats_loss=0.01283, ecapa_loss=0.0001606, whisper_loss=0.06195, over 13064.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001661, whisper_loss=0.09118, over 3886683.18 frames. ], batch size: 56, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:54:45,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2019490.0, ans=0.0 2024-08-13 05:54:57,391 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 05:55:09,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2024-08-13 05:55:28,763 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 05:55:42,715 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 05:56:02,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13600, loss[loss=0.08364, beats_loss=0.01291, ecapa_loss=0.0001434, whisper_loss=0.0693, over 18448.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.0001665, whisper_loss=0.09128, over 3895536.06 frames. ], batch size: 76, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:56:22,670 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 05:56:28,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2020090.0, ans=10.0 2024-08-13 05:56:37,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2020190.0, ans=0.1 2024-08-13 05:56:38,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2020190.0, ans=10.0 2024-08-13 05:56:58,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2020290.0, ans=0.125 2024-08-13 05:57:03,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.439e+01 2.789e+01 3.158e+01 4.809e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 05:57:07,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2020390.0, ans=0.1 2024-08-13 05:57:18,367 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 05:57:24,480 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 05:57:25,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13650, loss[loss=0.09824, beats_loss=0.009278, ecapa_loss=0.0002105, whisper_loss=0.08686, over 18785.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01105, ecapa_loss=0.0001667, whisper_loss=0.0913, over 3880448.04 frames. ], batch size: 76, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:57:34,913 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 05:57:40,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-13 05:57:55,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2020590.0, ans=0.0 2024-08-13 05:57:56,150 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 05:58:01,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2020690.0, ans=0.2 2024-08-13 05:58:17,163 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 05:58:43,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2020990.0, ans=0.015 2024-08-13 05:58:44,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2020990.0, ans=0.1 2024-08-13 05:58:45,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13700, loss[loss=0.1101, beats_loss=0.009974, ecapa_loss=0.0001758, whisper_loss=0.09832, over 18537.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01105, ecapa_loss=0.0001661, whisper_loss=0.09098, over 3868664.40 frames. ], batch size: 75, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:58:49,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2020990.0, ans=0.95 2024-08-13 05:58:56,342 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 05:59:08,218 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 05:59:14,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2021190.0, ans=0.0 2024-08-13 05:59:21,753 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 05:59:25,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2021190.0, ans=0.125 2024-08-13 05:59:26,342 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 05:59:39,457 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 05:59:39,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2021290.0, ans=0.125 2024-08-13 05:59:40,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.485e+01 2.717e+01 3.143e+01 5.833e+01, threshold=5.434e+01, percent-clipped=2.0 2024-08-13 05:59:54,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-13 05:59:58,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13750, loss[loss=0.1006, beats_loss=0.007723, ecapa_loss=0.0002442, whisper_loss=0.09044, over 13481.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.011, ecapa_loss=0.0001673, whisper_loss=0.09116, over 3860639.24 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:59:59,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2021490.0, ans=0.125 2024-08-13 06:00:34,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-08-13 06:00:43,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2021790.0, ans=0.0 2024-08-13 06:00:45,550 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 06:01:00,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2024-08-13 06:01:06,190 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 06:01:07,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13800, loss[loss=0.09662, beats_loss=0.01081, ecapa_loss=0.0001679, whisper_loss=0.08412, over 15031.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001663, whisper_loss=0.09133, over 3865080.83 frames. ], batch size: 61, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:01:12,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-13 06:01:16,241 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 06:01:26,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2022090.0, ans=0.1 2024-08-13 06:01:27,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-08-13 06:01:40,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2022190.0, ans=0.125 2024-08-13 06:01:48,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2022290.0, ans=0.0 2024-08-13 06:01:48,697 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:01:57,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.696e+01 2.984e+01 4.554e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-13 06:02:05,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2022390.0, ans=0.125 2024-08-13 06:02:12,711 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 06:02:15,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13850, loss[loss=0.104, beats_loss=0.01239, ecapa_loss=0.0001381, whisper_loss=0.09027, over 23914.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01095, ecapa_loss=0.000165, whisper_loss=0.09191, over 3923493.00 frames. ], batch size: 94, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:02:15,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2022490.0, ans=0.035 2024-08-13 06:02:17,939 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 06:02:21,193 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-13 06:02:24,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2022490.0, ans=0.0 2024-08-13 06:02:29,385 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 06:02:45,211 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.549e-02 2024-08-13 06:02:46,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-08-13 06:02:47,558 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 06:02:50,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2022690.0, ans=0.125 2024-08-13 06:03:14,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2022890.0, ans=0.125 2024-08-13 06:03:17,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-08-13 06:03:22,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2022890.0, ans=0.125 2024-08-13 06:03:24,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13900, loss[loss=0.1092, beats_loss=0.01268, ecapa_loss=0.0001515, whisper_loss=0.09505, over 19817.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001646, whisper_loss=0.0921, over 3920960.29 frames. ], batch size: 80, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:03:30,348 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 06:03:37,139 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 06:04:03,331 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 06:04:12,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2023290.0, ans=0.2 2024-08-13 06:04:13,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2023290.0, ans=0.0 2024-08-13 06:04:15,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.449e+01 2.734e+01 3.123e+01 1.484e+02, threshold=5.468e+01, percent-clipped=1.0 2024-08-13 06:04:19,833 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 06:04:20,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2024-08-13 06:04:21,244 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 06:04:31,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2023390.0, ans=0.125 2024-08-13 06:04:33,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 13950, loss[loss=0.1045, beats_loss=0.007915, ecapa_loss=0.0002118, whisper_loss=0.09449, over 21207.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01082, ecapa_loss=0.0001661, whisper_loss=0.09281, over 3889860.05 frames. ], batch size: 86, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:04:43,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-08-13 06:04:48,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2023590.0, ans=0.125 2024-08-13 06:04:59,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-13 06:05:22,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2023790.0, ans=0.2 2024-08-13 06:05:33,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=12.0 2024-08-13 06:05:41,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14000, loss[loss=0.1119, beats_loss=0.009962, ecapa_loss=0.0001335, whisper_loss=0.1006, over 14686.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01075, ecapa_loss=0.000165, whisper_loss=0.09255, over 3870267.57 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:05:46,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2023990.0, ans=0.0 2024-08-13 06:05:46,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2023990.0, ans=0.125 2024-08-13 06:05:50,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2023990.0, ans=0.2 2024-08-13 06:05:52,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2023990.0, ans=0.0 2024-08-13 06:06:10,612 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 06:06:13,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2024190.0, ans=0.0 2024-08-13 06:06:14,807 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 06:06:23,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2024290.0, ans=0.125 2024-08-13 06:06:26,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2024290.0, ans=0.1 2024-08-13 06:06:32,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.439e+01 2.688e+01 3.210e+01 4.383e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 06:06:38,130 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 06:06:50,531 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14050, loss[loss=0.1111, beats_loss=0.01227, ecapa_loss=0.0001443, whisper_loss=0.09742, over 23092.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001655, whisper_loss=0.09206, over 3842731.51 frames. ], batch size: 91, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:07:02,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2024490.0, ans=0.125 2024-08-13 06:07:17,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2024690.0, ans=0.125 2024-08-13 06:07:17,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2024690.0, ans=0.0 2024-08-13 06:07:22,421 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 06:07:41,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2024790.0, ans=0.0 2024-08-13 06:07:44,720 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 06:07:48,699 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 06:07:50,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2024890.0, ans=0.0 2024-08-13 06:07:57,104 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 06:07:59,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14100, loss[loss=0.09498, beats_loss=0.0122, ecapa_loss=0.000161, whisper_loss=0.08116, over 17458.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01074, ecapa_loss=0.0001664, whisper_loss=0.09249, over 3869855.58 frames. ], batch size: 73, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:08:00,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.05 vs. limit=6.0 2024-08-13 06:08:01,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-13 06:08:18,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2025090.0, ans=0.0 2024-08-13 06:08:26,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-13 06:08:29,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2025190.0, ans=0.125 2024-08-13 06:08:45,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2025290.0, ans=0.0 2024-08-13 06:08:51,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.495e+01 2.684e+01 2.972e+01 8.600e+01, threshold=5.367e+01, percent-clipped=1.0 2024-08-13 06:08:52,882 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-13 06:09:04,121 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 06:09:05,313 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 06:09:09,031 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14150, loss[loss=0.104, beats_loss=0.01239, ecapa_loss=0.0001349, whisper_loss=0.09028, over 20552.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01074, ecapa_loss=0.0001661, whisper_loss=0.09322, over 3873217.69 frames. ], batch size: 80, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:09:26,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2025590.0, ans=0.125 2024-08-13 06:09:43,469 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 06:09:46,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2025690.0, ans=0.2 2024-08-13 06:09:49,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2025790.0, ans=0.1 2024-08-13 06:09:56,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2025790.0, ans=0.125 2024-08-13 06:10:08,416 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-13 06:10:11,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2025890.0, ans=0.125 2024-08-13 06:10:17,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14200, loss[loss=0.08978, beats_loss=0.01416, ecapa_loss=0.0001361, whisper_loss=0.07427, over 18014.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001641, whisper_loss=0.09215, over 3879548.11 frames. ], batch size: 74, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:10:17,692 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 06:10:29,596 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 34 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 06:10:31,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2026090.0, ans=0.125 2024-08-13 06:10:32,468 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 06:10:39,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2026090.0, ans=0.0 2024-08-13 06:10:48,723 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 06:10:49,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2026190.0, ans=0.0 2024-08-13 06:11:01,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2026290.0, ans=0.0 2024-08-13 06:11:07,965 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.457e+01 2.666e+01 2.949e+01 5.330e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-13 06:11:25,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14250, loss[loss=0.1195, beats_loss=0.01147, ecapa_loss=0.0001626, whisper_loss=0.1064, over 23550.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001639, whisper_loss=0.09267, over 3925994.52 frames. ], batch size: 93, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:11:30,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-08-13 06:11:34,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2026490.0, ans=0.0 2024-08-13 06:11:53,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2026690.0, ans=0.1 2024-08-13 06:11:54,619 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-13 06:11:56,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2026690.0, ans=0.0 2024-08-13 06:12:05,569 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 06:12:05,794 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:12:09,992 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 06:12:10,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2026790.0, ans=0.0 2024-08-13 06:12:14,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2026790.0, ans=0.1 2024-08-13 06:12:18,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2026790.0, ans=0.0 2024-08-13 06:12:21,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2026890.0, ans=0.125 2024-08-13 06:12:26,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2026890.0, ans=0.0 2024-08-13 06:12:34,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14300, loss[loss=0.09941, beats_loss=0.01121, ecapa_loss=0.0001583, whisper_loss=0.08662, over 15906.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001629, whisper_loss=0.09186, over 3933019.24 frames. ], batch size: 63, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:12:35,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-13 06:12:42,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2026990.0, ans=0.0 2024-08-13 06:13:01,429 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 06:13:01,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.84 vs. limit=6.0 2024-08-13 06:13:04,265 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 06:13:06,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2027190.0, ans=0.0 2024-08-13 06:13:12,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2027190.0, ans=0.125 2024-08-13 06:13:16,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2027290.0, ans=0.0 2024-08-13 06:13:24,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.493e+01 2.791e+01 3.138e+01 4.573e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 06:13:32,846 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 06:13:41,852 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14350, loss[loss=0.0993, beats_loss=0.01016, ecapa_loss=0.0001796, whisper_loss=0.08734, over 13978.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0109, ecapa_loss=0.0001645, whisper_loss=0.09198, over 3888693.97 frames. ], batch size: 56, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:13:43,645 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 06:13:49,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2027490.0, ans=0.1 2024-08-13 06:13:53,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2027490.0, ans=0.1 2024-08-13 06:13:54,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=12.0 2024-08-13 06:14:08,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2024-08-13 06:14:21,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-13 06:14:30,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.04 vs. limit=15.0 2024-08-13 06:14:30,783 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 06:14:45,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2027890.0, ans=0.1 2024-08-13 06:14:52,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14400, loss[loss=0.09064, beats_loss=0.01047, ecapa_loss=0.0001918, whisper_loss=0.07825, over 15632.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001662, whisper_loss=0.09164, over 3905319.25 frames. ], batch size: 65, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:14:55,292 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 06:14:56,642 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 06:15:03,509 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 06:15:05,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2028090.0, ans=0.2 2024-08-13 06:15:12,048 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:15:13,018 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 06:15:46,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.480e+01 2.712e+01 3.054e+01 1.079e+02, threshold=5.424e+01, percent-clipped=2.0 2024-08-13 06:15:46,395 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 06:15:47,815 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 06:16:04,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.66 vs. limit=10.0 2024-08-13 06:16:06,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 14, batch 14450, loss[loss=0.1095, beats_loss=0.01047, ecapa_loss=0.0001945, whisper_loss=0.09711, over 22448.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01092, ecapa_loss=0.0001669, whisper_loss=0.09124, over 3882653.07 frames. ], batch size: 92, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:16:08,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2028490.0, ans=0.125 2024-08-13 06:16:30,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2028590.0, ans=0.0 2024-08-13 06:16:35,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2028690.0, ans=0.0 2024-08-13 06:16:39,733 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 06:16:53,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2028790.0, ans=0.125 2024-08-13 06:16:54,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2028790.0, ans=0.09899494936611666 2024-08-13 06:17:52,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 0, loss[loss=0.1169, beats_loss=0.00781, ecapa_loss=0.0001848, whisper_loss=0.1072, over 17564.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.00781, ecapa_loss=0.0001848, whisper_loss=0.1072, over 17564.00 frames. ], batch size: 67, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:17:52,543 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 06:18:35,177 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005623, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 06:18:52,035 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on SV_voxceleb1: loss=0.004582, beats_loss=0, ecapa_loss=0.0004582, whisper_loss=0, over 939242.00 frames. 2024-08-13 06:19:42,330 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.7504, 1.1558, 1.3742, 1.3546, 1.6301, 1.1800, 1.3600, 1.2825], device='cuda:3') 2024-08-13 06:20:54,482 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on AT_audioset: loss=0.02384, beats_loss=0.02384, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 06:20:54,485 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 06:21:06,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2024-08-13 06:22:26,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2029230.0, ans=0.0 2024-08-13 06:22:31,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-13 06:22:48,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.538e+01 2.901e+01 3.195e+01 5.923e+01, threshold=5.802e+01, percent-clipped=1.0 2024-08-13 06:22:55,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2029330.0, ans=0.1 2024-08-13 06:23:05,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 50, loss[loss=0.101, beats_loss=0.009512, ecapa_loss=0.0002008, whisper_loss=0.08947, over 16148.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01021, ecapa_loss=0.0001731, whisper_loss=0.08996, over 891609.67 frames. ], batch size: 65, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:23:09,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2029430.0, ans=0.0 2024-08-13 06:23:14,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2029430.0, ans=0.125 2024-08-13 06:23:49,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2029530.0, ans=0.1 2024-08-13 06:24:41,426 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 06:24:52,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2029830.0, ans=0.125 2024-08-13 06:25:04,424 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 100, loss[loss=0.09824, beats_loss=0.009523, ecapa_loss=0.0001476, whisper_loss=0.08724, over 16458.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01013, ecapa_loss=0.0001696, whisper_loss=0.0893, over 1554207.66 frames. ], batch size: 60, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:25:12,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2029930.0, ans=0.125 2024-08-13 06:25:47,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2030030.0, ans=0.0 2024-08-13 06:25:56,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2030130.0, ans=0.125 2024-08-13 06:26:23,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-13 06:26:42,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.792e+01 3.150e+01 3.564e+01 5.697e+01, threshold=6.299e+01, percent-clipped=0.0 2024-08-13 06:26:50,053 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 06:26:50,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2030330.0, ans=0.125 2024-08-13 06:26:56,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 150, loss[loss=0.106, beats_loss=0.01046, ecapa_loss=0.0001629, whisper_loss=0.09396, over 19782.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01012, ecapa_loss=0.0001688, whisper_loss=0.09086, over 2069991.66 frames. ], batch size: 77, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:27:25,075 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 06:27:42,996 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 06:28:19,943 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-13 06:28:23,156 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:28:27,532 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 200, loss[loss=0.09523, beats_loss=0.01097, ecapa_loss=0.0001562, whisper_loss=0.0827, over 22254.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001667, whisper_loss=0.0905, over 2453955.99 frames. ], batch size: 89, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:28:31,389 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 06:29:09,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2031130.0, ans=0.125 2024-08-13 06:29:25,771 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 06:29:39,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.436e+01 2.755e+01 3.099e+01 4.760e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-13 06:29:47,275 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 06:29:51,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 250, loss[loss=0.1196, beats_loss=0.009981, ecapa_loss=0.0001513, whisper_loss=0.1081, over 23590.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001663, whisper_loss=0.0911, over 2767988.82 frames. ], batch size: 90, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:30:07,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2031530.0, ans=0.0 2024-08-13 06:30:21,049 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 06:30:42,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-13 06:30:51,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2031730.0, ans=0.125 2024-08-13 06:30:58,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=12.0 2024-08-13 06:31:04,684 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 06:31:09,362 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 06:31:13,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 300, loss[loss=0.09251, beats_loss=0.01028, ecapa_loss=0.0001529, whisper_loss=0.0807, over 14078.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001677, whisper_loss=0.09125, over 2999311.47 frames. ], batch size: 55, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:31:18,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2031930.0, ans=0.1 2024-08-13 06:31:24,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.66 vs. limit=15.0 2024-08-13 06:31:29,386 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 06:31:30,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.61 vs. limit=15.0 2024-08-13 06:31:37,560 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 06:31:48,630 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06791721284389496, model_norm_threshold=55.09401321411133 2024-08-13 06:31:48,814 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.429e+05, grad_sumsq=7.164e+04, orig_rms_sq=8.974e+00 2024-08-13 06:31:59,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-13 06:32:14,132 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 06:32:19,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2032230.0, ans=0.0 2024-08-13 06:32:21,402 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 06:32:23,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2032330.0, ans=0.04949747468305833 2024-08-13 06:32:25,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2032330.0, ans=0.125 2024-08-13 06:32:25,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-13 06:32:25,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.443e+01 2.713e+01 2.990e+01 8.112e+02, threshold=5.427e+01, percent-clipped=1.0 2024-08-13 06:32:30,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-13 06:32:37,024 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 350, loss[loss=0.07488, beats_loss=0.01261, ecapa_loss=0.000145, whisper_loss=0.06083, over 17493.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001672, whisper_loss=0.09043, over 3158176.14 frames. ], batch size: 69, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:33:20,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2032630.0, ans=0.125 2024-08-13 06:33:25,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2032730.0, ans=0.1 2024-08-13 06:33:40,894 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 06:33:57,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 400, loss[loss=0.1158, beats_loss=0.01003, ecapa_loss=0.0002028, whisper_loss=0.1038, over 22615.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001665, whisper_loss=0.09037, over 3308999.53 frames. ], batch size: 94, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:34:02,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2032930.0, ans=0.0 2024-08-13 06:34:03,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2032930.0, ans=0.0 2024-08-13 06:34:05,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.24 vs. limit=10.0 2024-08-13 06:34:16,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2033030.0, ans=0.0 2024-08-13 06:34:38,849 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 06:34:42,556 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 06:34:51,523 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 06:35:04,329 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 06:35:06,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.548e+01 2.826e+01 3.113e+01 9.410e+01, threshold=5.653e+01, percent-clipped=3.0 2024-08-13 06:35:10,288 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 06:35:17,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2033430.0, ans=0.125 2024-08-13 06:35:18,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 450, loss[loss=0.1158, beats_loss=0.009683, ecapa_loss=0.0001457, whisper_loss=0.1046, over 23516.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.000166, whisper_loss=0.09045, over 3421055.37 frames. ], batch size: 90, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:35:18,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2033430.0, ans=0.125 2024-08-13 06:35:21,305 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 06:35:26,860 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-13 06:35:37,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2033530.0, ans=0.1 2024-08-13 06:35:55,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2033630.0, ans=0.0 2024-08-13 06:36:12,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-13 06:36:36,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2033930.0, ans=0.0 2024-08-13 06:36:37,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 500, loss[loss=0.08857, beats_loss=0.01272, ecapa_loss=0.0001499, whisper_loss=0.07435, over 20650.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001652, whisper_loss=0.09047, over 3522755.16 frames. ], batch size: 84, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:36:37,766 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 06:36:53,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2034030.0, ans=0.125 2024-08-13 06:37:00,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2034030.0, ans=0.1 2024-08-13 06:37:03,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-13 06:37:04,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2034030.0, ans=0.1 2024-08-13 06:37:07,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2034130.0, ans=0.125 2024-08-13 06:37:18,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2034130.0, ans=0.5 2024-08-13 06:37:25,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=22.5 2024-08-13 06:37:27,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-13 06:37:45,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.386e+01 2.704e+01 2.981e+01 6.756e+01, threshold=5.408e+01, percent-clipped=1.0 2024-08-13 06:37:47,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2034330.0, ans=0.0 2024-08-13 06:37:51,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2034330.0, ans=0.0 2024-08-13 06:37:56,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 550, loss[loss=0.09689, beats_loss=0.0117, ecapa_loss=0.0001543, whisper_loss=0.08365, over 20240.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001653, whisper_loss=0.0902, over 3574046.67 frames. ], batch size: 78, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:38:10,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2034430.0, ans=0.2 2024-08-13 06:38:13,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2034530.0, ans=0.07 2024-08-13 06:38:31,254 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 06:38:36,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2034630.0, ans=0.125 2024-08-13 06:38:43,283 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-13 06:39:00,953 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 06:39:06,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2034830.0, ans=0.0 2024-08-13 06:39:15,701 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 06:39:17,217 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 600, loss[loss=0.08798, beats_loss=0.01363, ecapa_loss=0.0001294, whisper_loss=0.07305, over 17080.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001637, whisper_loss=0.09021, over 3630090.19 frames. ], batch size: 68, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:39:55,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2035130.0, ans=0.125 2024-08-13 06:40:03,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2035230.0, ans=15.0 2024-08-13 06:40:04,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2035230.0, ans=0.125 2024-08-13 06:40:10,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2035230.0, ans=0.0 2024-08-13 06:40:25,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.434e+01 2.721e+01 3.072e+01 6.546e+01, threshold=5.441e+01, percent-clipped=1.0 2024-08-13 06:40:37,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 650, loss[loss=0.09163, beats_loss=0.01162, ecapa_loss=0.0001475, whisper_loss=0.07854, over 17984.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001639, whisper_loss=0.09018, over 3651861.83 frames. ], batch size: 69, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:40:50,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.38 vs. limit=22.5 2024-08-13 06:41:03,596 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 06:41:03,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2035530.0, ans=0.125 2024-08-13 06:41:03,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2035530.0, ans=0.2 2024-08-13 06:41:10,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2035630.0, ans=0.125 2024-08-13 06:41:19,126 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 06:41:19,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2035630.0, ans=0.07 2024-08-13 06:41:36,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2035730.0, ans=0.125 2024-08-13 06:41:38,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2035730.0, ans=0.125 2024-08-13 06:41:40,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2035730.0, ans=0.125 2024-08-13 06:41:53,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2035830.0, ans=0.0 2024-08-13 06:41:59,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 700, loss[loss=0.0965, beats_loss=0.0109, ecapa_loss=0.0001615, whisper_loss=0.08399, over 21961.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001628, whisper_loss=0.08996, over 3726866.31 frames. ], batch size: 90, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:42:06,970 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-13 06:42:23,809 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 06:42:24,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-13 06:42:31,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2036130.0, ans=0.125 2024-08-13 06:42:32,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2036130.0, ans=6.0 2024-08-13 06:42:54,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-13 06:42:55,904 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 06:43:08,487 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.352e+01 2.612e+01 3.001e+01 5.116e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-13 06:43:19,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 750, loss[loss=0.1237, beats_loss=0.008273, ecapa_loss=0.0002043, whisper_loss=0.1134, over 18973.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001626, whisper_loss=0.09118, over 3752984.86 frames. ], batch size: 74, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:43:19,935 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 06:43:39,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2036530.0, ans=0.125 2024-08-13 06:44:17,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2036730.0, ans=0.125 2024-08-13 06:44:18,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2036730.0, ans=0.125 2024-08-13 06:44:32,454 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 06:44:37,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 800, loss[loss=0.1109, beats_loss=0.01108, ecapa_loss=0.0001406, whisper_loss=0.09837, over 23566.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001622, whisper_loss=0.09148, over 3789780.99 frames. ], batch size: 94, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:44:52,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2037030.0, ans=0.125 2024-08-13 06:44:53,660 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.672e-01 2024-08-13 06:44:55,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2037030.0, ans=0.2 2024-08-13 06:44:58,181 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 06:45:10,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2037130.0, ans=0.2 2024-08-13 06:45:10,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2037130.0, ans=0.125 2024-08-13 06:45:25,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2037230.0, ans=0.025 2024-08-13 06:45:25,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2037230.0, ans=0.2 2024-08-13 06:45:28,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-08-13 06:45:31,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2037230.0, ans=0.2 2024-08-13 06:45:43,214 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.386e+01 2.631e+01 2.954e+01 1.989e+02, threshold=5.262e+01, percent-clipped=2.0 2024-08-13 06:45:53,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 850, loss[loss=0.0953, beats_loss=0.01199, ecapa_loss=0.0001782, whisper_loss=0.08153, over 22379.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001623, whisper_loss=0.09058, over 3778112.92 frames. ], batch size: 93, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:46:29,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2037630.0, ans=0.125 2024-08-13 06:46:57,881 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 06:47:10,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 900, loss[loss=0.1086, beats_loss=0.0118, ecapa_loss=0.0001629, whisper_loss=0.09517, over 22868.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001621, whisper_loss=0.09068, over 3776544.62 frames. ], batch size: 92, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:47:54,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2038230.0, ans=0.125 2024-08-13 06:48:00,661 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 06:48:03,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2038230.0, ans=0.0 2024-08-13 06:48:08,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2038330.0, ans=0.0 2024-08-13 06:48:12,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.393e+01 2.649e+01 3.126e+01 8.192e+01, threshold=5.298e+01, percent-clipped=1.0 2024-08-13 06:48:19,160 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 06:48:22,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 950, loss[loss=0.08842, beats_loss=0.01327, ecapa_loss=0.0001381, whisper_loss=0.07377, over 17294.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01079, ecapa_loss=0.0001614, whisper_loss=0.0894, over 3780158.14 frames. ], batch size: 70, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:48:23,378 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 06:48:24,601 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 06:48:26,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-08-13 06:48:46,558 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 06:48:53,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2038630.0, ans=0.0 2024-08-13 06:49:18,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-13 06:49:23,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2038730.0, ans=0.125 2024-08-13 06:49:32,619 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 06:49:34,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2038830.0, ans=0.125 2024-08-13 06:49:44,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1000, loss[loss=0.1117, beats_loss=0.008673, ecapa_loss=0.0001982, whisper_loss=0.1011, over 22553.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.000162, whisper_loss=0.08989, over 3767111.66 frames. ], batch size: 93, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:49:48,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-13 06:49:50,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2038930.0, ans=0.125 2024-08-13 06:49:50,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-13 06:49:56,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-13 06:50:10,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-13 06:50:12,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2039030.0, ans=0.1 2024-08-13 06:50:22,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2039130.0, ans=0.125 2024-08-13 06:50:27,506 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 06:50:55,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.424e+01 2.734e+01 3.160e+01 9.771e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-13 06:51:03,341 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.313e+02 2024-08-13 06:51:05,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1050, loss[loss=0.1236, beats_loss=0.009293, ecapa_loss=0.0001849, whisper_loss=0.1124, over 23312.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001622, whisper_loss=0.09048, over 3825102.14 frames. ], batch size: 92, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:51:11,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2039430.0, ans=0.125 2024-08-13 06:51:12,791 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:51:14,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2039430.0, ans=0.05 2024-08-13 06:51:24,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-13 06:52:05,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-13 06:52:06,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-08-13 06:52:16,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-08-13 06:52:20,565 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1100, loss[loss=0.09693, beats_loss=0.01004, ecapa_loss=0.0001629, whisper_loss=0.08526, over 19848.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.000163, whisper_loss=0.09039, over 3806828.67 frames. ], batch size: 77, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:52:24,553 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 06:52:26,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2039930.0, ans=0.125 2024-08-13 06:52:28,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2039930.0, ans=0.2 2024-08-13 06:52:37,492 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 06:52:49,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2040130.0, ans=0.125 2024-08-13 06:53:01,419 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 11 from LS+wenet, 9 from Vox, 38 fro AS 2024-08-13 06:53:04,551 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 06:53:08,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2040230.0, ans=0.0 2024-08-13 06:53:21,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2040330.0, ans=0.1 2024-08-13 06:53:22,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2040330.0, ans=0.125 2024-08-13 06:53:24,785 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.366e+01 2.661e+01 3.055e+01 5.230e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-13 06:53:32,924 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1150, loss[loss=0.1036, beats_loss=0.008003, ecapa_loss=0.0001786, whisper_loss=0.09378, over 22431.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001621, whisper_loss=0.09014, over 3841562.35 frames. ], batch size: 90, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:53:49,926 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 06:53:51,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2040530.0, ans=0.2 2024-08-13 06:53:51,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2040530.0, ans=0.0 2024-08-13 06:53:51,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-08-13 06:53:54,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.78 vs. limit=5.0 2024-08-13 06:53:56,684 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 06:54:02,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2040630.0, ans=0.0 2024-08-13 06:54:23,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2040730.0, ans=0.125 2024-08-13 06:54:25,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2040730.0, ans=0.1 2024-08-13 06:54:34,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2040830.0, ans=0.0 2024-08-13 06:54:44,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1200, loss[loss=0.09242, beats_loss=0.01299, ecapa_loss=0.0001392, whisper_loss=0.07804, over 23391.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001627, whisper_loss=0.09035, over 3838373.97 frames. ], batch size: 96, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:54:50,474 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 7 from Vox, 34 fro AS 2024-08-13 06:55:04,672 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 06:55:06,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-13 06:55:20,945 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 06:55:24,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-13 06:55:38,045 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:55:40,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2041330.0, ans=0.025 2024-08-13 06:55:46,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.369e+01 2.676e+01 3.078e+01 7.518e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 06:55:54,710 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1250, loss[loss=0.09859, beats_loss=0.01112, ecapa_loss=0.0001418, whisper_loss=0.08606, over 15950.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0108, ecapa_loss=0.0001618, whisper_loss=0.09065, over 3835184.42 frames. ], batch size: 61, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:56:04,103 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 06:56:11,959 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 06:56:15,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2041530.0, ans=0.125 2024-08-13 06:56:32,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2041630.0, ans=0.0 2024-08-13 06:56:34,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.67 vs. limit=10.0 2024-08-13 06:56:43,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2041730.0, ans=15.0 2024-08-13 06:56:44,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2041730.0, ans=0.125 2024-08-13 06:56:49,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2041830.0, ans=0.07 2024-08-13 06:57:01,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1300, loss[loss=0.1102, beats_loss=0.009357, ecapa_loss=0.0002112, whisper_loss=0.09875, over 22504.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001609, whisper_loss=0.09028, over 3842095.01 frames. ], batch size: 92, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:57:03,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2041930.0, ans=0.125 2024-08-13 06:57:19,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2024-08-13 06:57:44,712 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 06:57:51,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2042230.0, ans=0.0 2024-08-13 06:57:59,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.323e+01 2.630e+01 3.145e+01 6.794e+01, threshold=5.259e+01, percent-clipped=2.0 2024-08-13 06:58:00,646 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 8 from Vox, 41 fro AS 2024-08-13 06:58:02,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2042330.0, ans=0.0 2024-08-13 06:58:02,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-08-13 06:58:07,130 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1350, loss[loss=0.09764, beats_loss=0.01303, ecapa_loss=0.0001273, whisper_loss=0.08334, over 20846.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01089, ecapa_loss=0.0001599, whisper_loss=0.0905, over 3858510.36 frames. ], batch size: 81, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:58:16,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2042430.0, ans=0.2 2024-08-13 06:58:19,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2042530.0, ans=0.0 2024-08-13 06:58:19,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2042530.0, ans=0.09899494936611666 2024-08-13 06:58:21,398 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 06:58:34,876 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 06:58:46,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2042730.0, ans=0.0 2024-08-13 06:58:47,705 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 06:58:54,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2042730.0, ans=0.1 2024-08-13 06:59:00,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2042830.0, ans=0.1 2024-08-13 06:59:05,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2042830.0, ans=0.2 2024-08-13 06:59:13,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1400, loss[loss=0.1151, beats_loss=0.00814, ecapa_loss=0.0001875, whisper_loss=0.1051, over 16577.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01082, ecapa_loss=0.0001605, whisper_loss=0.08951, over 3805293.06 frames. ], batch size: 64, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:59:41,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2043130.0, ans=0.0 2024-08-13 06:59:46,058 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 06:59:47,327 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 06:59:54,129 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-13 06:59:56,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2024-08-13 07:00:02,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2043230.0, ans=0.2 2024-08-13 07:00:12,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.355e+01 2.665e+01 2.989e+01 4.736e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-13 07:00:20,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1450, loss[loss=0.0924, beats_loss=0.01034, ecapa_loss=0.0001823, whisper_loss=0.08023, over 16728.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01081, ecapa_loss=0.0001586, whisper_loss=0.08935, over 3796854.05 frames. ], batch size: 69, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:00:50,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2043430.0, ans=0.0 2024-08-13 07:01:04,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2043530.0, ans=0.2 2024-08-13 07:01:06,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2043530.0, ans=0.2 2024-08-13 07:01:10,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2043630.0, ans=0.125 2024-08-13 07:01:14,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2043630.0, ans=0.0 2024-08-13 07:01:34,095 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 07:01:51,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1500, loss[loss=0.1082, beats_loss=0.01145, ecapa_loss=0.0001864, whisper_loss=0.09488, over 20693.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01082, ecapa_loss=0.0001597, whisper_loss=0.08952, over 3855147.48 frames. ], batch size: 82, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:02:07,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2044030.0, ans=0.1 2024-08-13 07:02:09,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-08-13 07:02:28,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2044130.0, ans=0.125 2024-08-13 07:02:28,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2044130.0, ans=0.0 2024-08-13 07:02:34,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2044230.0, ans=0.125 2024-08-13 07:02:41,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-13 07:02:51,414 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.422e+01 2.612e+01 2.997e+01 7.275e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 07:02:59,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1550, loss[loss=0.1127, beats_loss=0.01132, ecapa_loss=0.0001473, whisper_loss=0.09988, over 23610.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001605, whisper_loss=0.09045, over 3858973.35 frames. ], batch size: 93, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:03:02,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2044430.0, ans=0.1 2024-08-13 07:03:02,894 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.525e+01 2024-08-13 07:03:05,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2044430.0, ans=0.125 2024-08-13 07:03:14,441 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 07:03:27,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-13 07:03:29,259 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 07:03:46,277 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 07:03:50,106 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 07:04:03,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2044830.0, ans=0.0 2024-08-13 07:04:09,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1600, loss[loss=0.1106, beats_loss=0.009129, ecapa_loss=0.0001535, whisper_loss=0.09998, over 15030.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001606, whisper_loss=0.09052, over 3820101.72 frames. ], batch size: 57, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:04:20,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2044930.0, ans=0.0 2024-08-13 07:04:23,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2045030.0, ans=0.125 2024-08-13 07:04:25,020 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 07:04:45,911 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 07:04:50,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2045230.0, ans=0.035 2024-08-13 07:04:52,944 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 07:04:54,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=22.5 2024-08-13 07:05:10,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.418e+01 2.670e+01 2.986e+01 1.271e+02, threshold=5.339e+01, percent-clipped=4.0 2024-08-13 07:05:20,008 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1650, loss[loss=0.07387, beats_loss=0.01014, ecapa_loss=0.0001599, whisper_loss=0.06213, over 17380.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0108, ecapa_loss=0.0001607, whisper_loss=0.09013, over 3828582.58 frames. ], batch size: 68, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:05:32,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-13 07:05:38,010 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 07:05:50,168 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 07:06:21,284 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-13 07:06:28,160 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 07:06:29,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1700, loss[loss=0.09552, beats_loss=0.01086, ecapa_loss=0.000185, whisper_loss=0.08282, over 17007.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001605, whisper_loss=0.08975, over 3835288.21 frames. ], batch size: 69, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:06:49,221 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 07:06:54,829 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 07:06:59,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2046130.0, ans=0.0 2024-08-13 07:07:09,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2046130.0, ans=0.125 2024-08-13 07:07:14,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2046230.0, ans=0.125 2024-08-13 07:07:14,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2046230.0, ans=0.0 2024-08-13 07:07:18,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2046230.0, ans=0.125 2024-08-13 07:07:25,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2046330.0, ans=0.125 2024-08-13 07:07:25,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=2046330.0, ans=22.5 2024-08-13 07:07:30,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2046330.0, ans=0.0 2024-08-13 07:07:31,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.428e+01 2.659e+01 3.089e+01 1.627e+02, threshold=5.319e+01, percent-clipped=1.0 2024-08-13 07:07:38,619 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 11 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 07:07:39,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1750, loss[loss=0.06723, beats_loss=0.01342, ecapa_loss=0.0001584, whisper_loss=0.05222, over 14679.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01077, ecapa_loss=0.0001598, whisper_loss=0.08951, over 3823455.21 frames. ], batch size: 59, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:07:44,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2046430.0, ans=0.0 2024-08-13 07:08:09,511 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-13 07:08:13,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2046630.0, ans=0.125 2024-08-13 07:08:15,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-13 07:08:17,627 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 07:08:21,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2046730.0, ans=0.0 2024-08-13 07:08:22,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2046730.0, ans=0.0 2024-08-13 07:08:25,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2046730.0, ans=0.125 2024-08-13 07:08:48,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2046930.0, ans=0.09899494936611666 2024-08-13 07:08:49,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1800, loss[loss=0.07153, beats_loss=0.01174, ecapa_loss=0.0001493, whisper_loss=0.0583, over 17026.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001602, whisper_loss=0.0898, over 3814545.41 frames. ], batch size: 68, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:09:00,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2024-08-13 07:09:01,246 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 07:09:13,316 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 07:09:13,622 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:09:19,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2047130.0, ans=0.125 2024-08-13 07:09:34,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2047230.0, ans=0.125 2024-08-13 07:09:37,176 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 07:09:37,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2024-08-13 07:09:51,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.458e+01 2.695e+01 3.131e+01 5.479e+01, threshold=5.391e+01, percent-clipped=1.0 2024-08-13 07:09:59,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-13 07:09:59,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1850, loss[loss=0.1046, beats_loss=0.01027, ecapa_loss=0.0001791, whisper_loss=0.09252, over 22833.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.000159, whisper_loss=0.09015, over 3855262.42 frames. ], batch size: 91, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:10:08,378 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 07:10:11,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2047430.0, ans=0.2 2024-08-13 07:10:21,856 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 07:10:29,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2047630.0, ans=0.09899494936611666 2024-08-13 07:10:35,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2047630.0, ans=0.125 2024-08-13 07:11:04,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2047830.0, ans=0.2 2024-08-13 07:11:08,063 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1900, loss[loss=0.1081, beats_loss=0.009379, ecapa_loss=0.0002283, whisper_loss=0.09647, over 15851.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001611, whisper_loss=0.09022, over 3829208.44 frames. ], batch size: 66, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:11:10,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2047930.0, ans=0.125 2024-08-13 07:11:10,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2047930.0, ans=6.0 2024-08-13 07:11:16,807 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 07:11:33,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2048030.0, ans=0.125 2024-08-13 07:11:54,477 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:11:57,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2048230.0, ans=0.1 2024-08-13 07:12:01,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2048230.0, ans=0.0 2024-08-13 07:12:09,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.366e+01 2.636e+01 3.036e+01 8.197e+01, threshold=5.272e+01, percent-clipped=3.0 2024-08-13 07:12:11,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2048330.0, ans=0.09899494936611666 2024-08-13 07:12:17,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 1950, loss[loss=0.08948, beats_loss=0.009383, ecapa_loss=0.0002122, whisper_loss=0.07798, over 15988.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001614, whisper_loss=0.09023, over 3822328.11 frames. ], batch size: 69, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:12:33,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2024-08-13 07:12:37,171 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 07:12:44,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2048530.0, ans=0.0 2024-08-13 07:12:58,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2048630.0, ans=0.0 2024-08-13 07:13:10,677 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 07:13:11,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-13 07:13:33,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2000, loss[loss=0.1157, beats_loss=0.01263, ecapa_loss=0.000133, whisper_loss=0.1018, over 22393.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.000162, whisper_loss=0.09069, over 3836038.47 frames. ], batch size: 88, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:13:45,253 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 07:13:54,273 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 07:14:06,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2049130.0, ans=0.125 2024-08-13 07:14:08,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2049130.0, ans=0.125 2024-08-13 07:14:11,676 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:14:42,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.431e+01 2.683e+01 2.951e+01 6.273e+01, threshold=5.366e+01, percent-clipped=2.0 2024-08-13 07:14:50,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2049430.0, ans=0.07 2024-08-13 07:14:51,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2050, loss[loss=0.09218, beats_loss=0.01126, ecapa_loss=0.0001507, whisper_loss=0.07942, over 14836.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01078, ecapa_loss=0.0001612, whisper_loss=0.09021, over 3832786.48 frames. ], batch size: 60, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:15:02,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-13 07:15:12,111 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 07:15:16,441 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 07:15:35,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2049630.0, ans=0.1 2024-08-13 07:15:42,208 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 07:15:42,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2049730.0, ans=0.2 2024-08-13 07:15:49,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2049730.0, ans=0.0 2024-08-13 07:16:06,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2024-08-13 07:16:08,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2100, loss[loss=0.1157, beats_loss=0.01051, ecapa_loss=0.0001404, whisper_loss=0.1038, over 18290.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01079, ecapa_loss=0.0001606, whisper_loss=0.09012, over 3849568.88 frames. ], batch size: 69, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:16:10,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2049930.0, ans=0.125 2024-08-13 07:16:38,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2050130.0, ans=0.0 2024-08-13 07:16:40,720 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 07:16:49,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2050130.0, ans=0.125 2024-08-13 07:16:52,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2050230.0, ans=0.2 2024-08-13 07:16:57,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2024-08-13 07:16:58,202 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 07:17:12,765 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 07:17:15,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.374e+01 2.616e+01 2.948e+01 7.626e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 07:17:24,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2150, loss[loss=0.08984, beats_loss=0.01374, ecapa_loss=0.0001535, whisper_loss=0.07456, over 22675.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01087, ecapa_loss=0.00016, whisper_loss=0.09017, over 3824626.77 frames. ], batch size: 93, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:17:36,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2050430.0, ans=0.0 2024-08-13 07:17:38,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2050530.0, ans=0.125 2024-08-13 07:17:46,561 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 07:17:57,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-13 07:17:57,869 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 07:18:19,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2050730.0, ans=0.0 2024-08-13 07:18:19,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2050730.0, ans=0.125 2024-08-13 07:18:26,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2050830.0, ans=0.1 2024-08-13 07:18:37,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2200, loss[loss=0.1195, beats_loss=0.01248, ecapa_loss=0.0001353, whisper_loss=0.1057, over 22650.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01093, ecapa_loss=0.00016, whisper_loss=0.09034, over 3827747.72 frames. ], batch size: 89, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:18:52,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2051030.0, ans=0.0 2024-08-13 07:18:53,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2051030.0, ans=0.0 2024-08-13 07:18:55,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-13 07:18:58,390 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 07:19:03,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2051030.0, ans=0.125 2024-08-13 07:19:04,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-13 07:19:07,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2051130.0, ans=0.125 2024-08-13 07:19:11,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2051130.0, ans=0.1 2024-08-13 07:19:17,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-08-13 07:19:34,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2051230.0, ans=0.2 2024-08-13 07:19:43,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.405e+01 2.692e+01 3.101e+01 3.996e+01, threshold=5.385e+01, percent-clipped=0.0 2024-08-13 07:19:52,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2250, loss[loss=0.1195, beats_loss=0.009691, ecapa_loss=0.0001514, whisper_loss=0.1083, over 19163.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001606, whisper_loss=0.09139, over 3814323.46 frames. ], batch size: 71, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:19:57,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-13 07:20:18,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2051530.0, ans=0.125 2024-08-13 07:21:10,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2300, loss[loss=0.1086, beats_loss=0.01014, ecapa_loss=0.0001638, whisper_loss=0.09683, over 21443.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.0001617, whisper_loss=0.09219, over 3840481.23 frames. ], batch size: 87, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:21:26,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-13 07:21:35,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=22.5 2024-08-13 07:21:44,722 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.831e-01 2024-08-13 07:21:52,287 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 07:22:01,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-13 07:22:04,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2052230.0, ans=0.1 2024-08-13 07:22:11,306 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 07:22:20,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.551e+01 2.802e+01 3.286e+01 4.961e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 07:22:24,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2052330.0, ans=0.04949747468305833 2024-08-13 07:22:27,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2350, loss[loss=0.09265, beats_loss=0.01298, ecapa_loss=0.0001861, whisper_loss=0.07781, over 16002.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0001626, whisper_loss=0.09185, over 3827891.80 frames. ], batch size: 66, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:22:32,380 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 07:22:34,048 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 07:22:41,407 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 07:22:42,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2052530.0, ans=0.125 2024-08-13 07:22:44,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2052530.0, ans=0.1 2024-08-13 07:22:52,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2052530.0, ans=0.0 2024-08-13 07:22:57,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2052630.0, ans=0.0 2024-08-13 07:22:58,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2052630.0, ans=0.0 2024-08-13 07:23:06,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2052630.0, ans=0.125 2024-08-13 07:23:08,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=12.0 2024-08-13 07:23:08,791 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 07:23:43,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2400, loss[loss=0.08423, beats_loss=0.01241, ecapa_loss=0.0001804, whisper_loss=0.07001, over 13887.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.0001636, whisper_loss=0.09203, over 3844427.45 frames. ], batch size: 58, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:23:43,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2052930.0, ans=0.125 2024-08-13 07:23:58,761 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 07:24:02,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2024-08-13 07:24:09,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2053030.0, ans=0.2 2024-08-13 07:24:24,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2024-08-13 07:24:52,565 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.349e+01 2.648e+01 3.316e+01 5.305e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:25:00,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2450, loss[loss=0.0934, beats_loss=0.01225, ecapa_loss=0.0001702, whisper_loss=0.07945, over 21552.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01077, ecapa_loss=0.0001639, whisper_loss=0.0921, over 3847561.71 frames. ], batch size: 91, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:25:02,138 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 07:25:02,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2053430.0, ans=0.125 2024-08-13 07:25:16,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2053530.0, ans=0.125 2024-08-13 07:25:23,108 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 07:25:27,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2053530.0, ans=0.125 2024-08-13 07:25:29,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2053630.0, ans=0.1 2024-08-13 07:25:47,487 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-13 07:25:57,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2053730.0, ans=0.2 2024-08-13 07:26:03,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2053830.0, ans=0.125 2024-08-13 07:26:12,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2053830.0, ans=0.125 2024-08-13 07:26:12,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2053830.0, ans=0.125 2024-08-13 07:26:14,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2053830.0, ans=15.0 2024-08-13 07:26:16,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2500, loss[loss=0.13, beats_loss=0.007897, ecapa_loss=0.0001783, whisper_loss=0.1203, over 18862.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01076, ecapa_loss=0.0001637, whisper_loss=0.09237, over 3866452.88 frames. ], batch size: 74, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:26:24,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2053930.0, ans=0.125 2024-08-13 07:26:26,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2053930.0, ans=0.125 2024-08-13 07:26:30,143 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 07:26:36,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2054030.0, ans=0.125 2024-08-13 07:26:38,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2054030.0, ans=0.125 2024-08-13 07:26:39,351 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 07:26:41,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2054030.0, ans=10.0 2024-08-13 07:26:43,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2054030.0, ans=0.125 2024-08-13 07:26:57,638 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 07:27:08,433 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 07:27:25,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.432e+01 2.694e+01 2.986e+01 7.508e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-13 07:27:27,704 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 43 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 07:27:33,207 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2550, loss[loss=0.07804, beats_loss=0.01376, ecapa_loss=0.0001538, whisper_loss=0.06274, over 14637.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01075, ecapa_loss=0.0001622, whisper_loss=0.09241, over 3863729.65 frames. ], batch size: 59, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:27:39,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2054430.0, ans=0.1 2024-08-13 07:27:43,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2054430.0, ans=0.0 2024-08-13 07:27:48,541 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 07:28:14,569 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 07:28:24,170 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-13 07:28:32,535 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 07:28:38,061 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 07:28:44,382 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 33 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 07:28:46,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2054930.0, ans=0.0 2024-08-13 07:28:47,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2600, loss[loss=0.0979, beats_loss=0.01236, ecapa_loss=0.0001825, whisper_loss=0.08371, over 22169.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0107, ecapa_loss=0.0001622, whisper_loss=0.09236, over 3868971.50 frames. ], batch size: 91, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:28:47,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2054930.0, ans=0.125 2024-08-13 07:28:50,402 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 07:28:54,068 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 07:29:17,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2055130.0, ans=0.2 2024-08-13 07:29:21,572 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 07:29:30,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2055130.0, ans=0.07 2024-08-13 07:29:38,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2055230.0, ans=0.0 2024-08-13 07:29:39,680 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-13 07:29:50,534 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 07:29:56,039 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.419e+01 2.702e+01 3.112e+01 4.104e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-13 07:29:57,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2055330.0, ans=0.0 2024-08-13 07:30:03,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2024-08-13 07:30:03,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2650, loss[loss=0.09395, beats_loss=0.008951, ecapa_loss=0.0001525, whisper_loss=0.08348, over 15007.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0107, ecapa_loss=0.0001625, whisper_loss=0.09207, over 3852661.05 frames. ], batch size: 56, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:30:18,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2055530.0, ans=0.0 2024-08-13 07:30:36,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-13 07:30:42,434 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 18 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-13 07:30:50,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-13 07:30:51,373 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 07:30:53,932 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 07:30:58,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2055730.0, ans=0.125 2024-08-13 07:31:10,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2055830.0, ans=0.0 2024-08-13 07:31:14,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2055830.0, ans=0.125 2024-08-13 07:31:19,065 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 18 from LS+wenet, 31 from Vox, 43 fro AS 2024-08-13 07:31:20,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2700, loss[loss=0.07773, beats_loss=0.01294, ecapa_loss=0.0001881, whisper_loss=0.06292, over 21394.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0108, ecapa_loss=0.0001616, whisper_loss=0.09109, over 3873882.58 frames. ], batch size: 92, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:31:30,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2055930.0, ans=0.125 2024-08-13 07:31:33,084 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 07:31:45,907 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 07:32:00,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2056130.0, ans=0.0 2024-08-13 07:32:08,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2056230.0, ans=0.125 2024-08-13 07:32:16,999 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.822e-02 2024-08-13 07:32:20,702 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 07:32:28,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.371e+01 2.713e+01 3.227e+01 1.003e+02, threshold=5.426e+01, percent-clipped=2.0 2024-08-13 07:32:29,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2056330.0, ans=0.5 2024-08-13 07:32:29,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-13 07:32:36,818 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2750, loss[loss=0.09993, beats_loss=0.01219, ecapa_loss=0.0001556, whisper_loss=0.08619, over 22398.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001606, whisper_loss=0.09139, over 3882631.99 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:32:38,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2056430.0, ans=0.125 2024-08-13 07:32:45,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2056430.0, ans=0.0 2024-08-13 07:32:47,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2056430.0, ans=0.125 2024-08-13 07:32:57,325 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 07:33:00,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2056530.0, ans=0.0 2024-08-13 07:33:03,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2056530.0, ans=0.1 2024-08-13 07:33:13,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2056630.0, ans=0.125 2024-08-13 07:33:29,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-08-13 07:33:33,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2056730.0, ans=0.125 2024-08-13 07:33:40,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-13 07:33:47,925 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 07:33:52,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.92 vs. limit=10.0 2024-08-13 07:33:55,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2800, loss[loss=0.08515, beats_loss=0.01121, ecapa_loss=0.0001803, whisper_loss=0.07214, over 18969.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001607, whisper_loss=0.09136, over 3869999.15 frames. ], batch size: 81, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:34:05,100 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 34 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 07:34:27,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2057130.0, ans=0.125 2024-08-13 07:34:32,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2057130.0, ans=0.125 2024-08-13 07:34:44,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2057230.0, ans=0.125 2024-08-13 07:34:47,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-13 07:34:54,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2057230.0, ans=0.1 2024-08-13 07:34:54,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2057230.0, ans=0.0 2024-08-13 07:34:55,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2057230.0, ans=0.2 2024-08-13 07:35:08,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.448e+01 2.685e+01 2.951e+01 5.516e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-13 07:35:12,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2057330.0, ans=0.125 2024-08-13 07:35:15,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2850, loss[loss=0.1108, beats_loss=0.009575, ecapa_loss=0.0001739, whisper_loss=0.09948, over 22270.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01082, ecapa_loss=0.0001612, whisper_loss=0.09197, over 3876254.53 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:35:23,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2057430.0, ans=0.1 2024-08-13 07:35:37,326 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 07:35:57,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2057630.0, ans=0.125 2024-08-13 07:36:06,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2057730.0, ans=0.0 2024-08-13 07:36:12,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2057730.0, ans=0.2 2024-08-13 07:36:23,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2024-08-13 07:36:24,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2057830.0, ans=0.0 2024-08-13 07:36:29,783 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 07:36:38,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2900, loss[loss=0.08241, beats_loss=0.01293, ecapa_loss=0.0001392, whisper_loss=0.06808, over 18948.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01086, ecapa_loss=0.0001625, whisper_loss=0.09158, over 3866297.83 frames. ], batch size: 76, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:36:42,935 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 29 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 07:36:49,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2057930.0, ans=0.0 2024-08-13 07:37:10,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2058130.0, ans=10.0 2024-08-13 07:37:14,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2058130.0, ans=0.125 2024-08-13 07:37:38,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2058230.0, ans=0.1 2024-08-13 07:37:49,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2058330.0, ans=0.2 2024-08-13 07:37:49,947 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.431e+01 2.692e+01 3.123e+01 5.434e+01, threshold=5.383e+01, percent-clipped=1.0 2024-08-13 07:37:58,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 2950, loss[loss=0.1083, beats_loss=0.01057, ecapa_loss=0.0001478, whisper_loss=0.09624, over 23288.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001642, whisper_loss=0.09151, over 3887732.88 frames. ], batch size: 90, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:38:07,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2058430.0, ans=0.0 2024-08-13 07:38:39,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2058630.0, ans=0.1 2024-08-13 07:39:09,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-13 07:39:22,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2058930.0, ans=0.125 2024-08-13 07:39:23,156 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3000, loss[loss=0.1003, beats_loss=0.00962, ecapa_loss=0.0001794, whisper_loss=0.08889, over 17051.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.0001641, whisper_loss=0.09234, over 3912985.45 frames. ], batch size: 68, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:39:23,156 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 07:40:01,997 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005768, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 07:40:19,485 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on SV_voxceleb1: loss=0.00457, beats_loss=0, ecapa_loss=0.000457, whisper_loss=0, over 939242.00 frames. 2024-08-13 07:41:17,347 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4524e-04, 4.2803e-03, 1.8960e-03, 3.4210e+00, 1.0445e-02, 3.1920e-02, 1.7707e-02, 4.6392e-02], device='cuda:3') 2024-08-13 07:42:09,772 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 07:42:09,780 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 07:42:41,462 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-13 07:42:41,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2059030.0, ans=0.0 2024-08-13 07:42:49,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2059130.0, ans=0.0 2024-08-13 07:42:54,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2059130.0, ans=0.125 2024-08-13 07:43:09,901 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 07:43:29,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-13 07:43:30,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.520e+01 2.888e+01 3.342e+01 5.667e+01, threshold=5.776e+01, percent-clipped=1.0 2024-08-13 07:43:35,402 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 07:43:36,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-08-13 07:43:38,306 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3050, loss[loss=0.0938, beats_loss=0.01296, ecapa_loss=0.0001819, whisper_loss=0.07902, over 21055.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01084, ecapa_loss=0.0001643, whisper_loss=0.0928, over 3929271.28 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:43:41,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-08-13 07:43:47,312 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 07:44:14,676 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 07:44:21,755 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 07:44:25,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2059630.0, ans=0.015 2024-08-13 07:44:34,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2059730.0, ans=0.125 2024-08-13 07:44:50,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=2059830.0, ans=22.5 2024-08-13 07:44:57,170 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 07:45:04,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3100, loss[loss=0.1009, beats_loss=0.01287, ecapa_loss=0.0001551, whisper_loss=0.08652, over 22085.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01082, ecapa_loss=0.0001661, whisper_loss=0.0927, over 3925546.91 frames. ], batch size: 92, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:45:13,356 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 07:45:21,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2060030.0, ans=0.125 2024-08-13 07:45:29,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2060030.0, ans=0.0 2024-08-13 07:45:39,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2060130.0, ans=0.2 2024-08-13 07:45:47,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2060130.0, ans=0.125 2024-08-13 07:45:49,845 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 07:46:06,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-08-13 07:46:16,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2060330.0, ans=0.95 2024-08-13 07:46:22,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.355e+01 2.648e+01 2.914e+01 4.175e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:46:24,263 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 07:46:25,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2060330.0, ans=0.0 2024-08-13 07:46:30,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2060430.0, ans=0.2 2024-08-13 07:46:30,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3150, loss[loss=0.1175, beats_loss=0.008129, ecapa_loss=0.0001839, whisper_loss=0.1075, over 14322.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01089, ecapa_loss=0.000167, whisper_loss=0.09205, over 3903235.00 frames. ], batch size: 56, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:46:38,601 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 07:46:51,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2060530.0, ans=0.05 2024-08-13 07:47:00,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2060530.0, ans=0.025 2024-08-13 07:47:07,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-13 07:47:39,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2060730.0, ans=0.125 2024-08-13 07:47:43,261 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 07:47:54,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2060830.0, ans=0.125 2024-08-13 07:47:59,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3200, loss[loss=0.1062, beats_loss=0.01228, ecapa_loss=0.0001553, whisper_loss=0.09236, over 22739.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01077, ecapa_loss=0.0001681, whisper_loss=0.09307, over 3903347.84 frames. ], batch size: 92, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:48:13,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2060930.0, ans=0.0 2024-08-13 07:48:17,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2061030.0, ans=0.1 2024-08-13 07:48:45,238 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 07:48:53,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2061230.0, ans=0.125 2024-08-13 07:48:55,687 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 07:48:56,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2024-08-13 07:49:17,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.368e+01 2.637e+01 3.039e+01 1.272e+02, threshold=5.274e+01, percent-clipped=1.0 2024-08-13 07:49:18,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2061330.0, ans=0.125 2024-08-13 07:49:25,829 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3250, loss[loss=0.114, beats_loss=0.01061, ecapa_loss=0.0001797, whisper_loss=0.1016, over 15070.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01081, ecapa_loss=0.0001673, whisper_loss=0.09264, over 3874563.50 frames. ], batch size: 61, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:49:31,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2061430.0, ans=0.0 2024-08-13 07:49:40,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=12.0 2024-08-13 07:49:47,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2061530.0, ans=0.2 2024-08-13 07:50:05,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2061630.0, ans=0.125 2024-08-13 07:50:09,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2061630.0, ans=0.0 2024-08-13 07:50:21,443 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 07:50:28,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2061730.0, ans=0.0 2024-08-13 07:50:47,527 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 07:50:50,056 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3300, loss[loss=0.106, beats_loss=0.01203, ecapa_loss=0.000192, whisper_loss=0.09205, over 14851.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01081, ecapa_loss=0.0001678, whisper_loss=0.09276, over 3923254.36 frames. ], batch size: 64, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:51:07,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2024-08-13 07:51:15,709 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 07:51:57,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2062230.0, ans=0.1 2024-08-13 07:52:01,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2024-08-13 07:52:02,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2062330.0, ans=0.2 2024-08-13 07:52:03,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2062330.0, ans=0.2 2024-08-13 07:52:08,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.435e+01 2.813e+01 3.312e+01 6.245e+01, threshold=5.626e+01, percent-clipped=3.0 2024-08-13 07:52:11,484 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 07:52:15,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3350, loss[loss=0.09723, beats_loss=0.01393, ecapa_loss=0.0001308, whisper_loss=0.08199, over 22240.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01083, ecapa_loss=0.0001667, whisper_loss=0.09278, over 3922447.91 frames. ], batch size: 88, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:52:16,101 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 07:52:19,180 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-13 07:52:39,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2062530.0, ans=0.125 2024-08-13 07:52:40,570 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 07:52:51,564 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 07:53:06,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2062730.0, ans=0.1 2024-08-13 07:53:35,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2062830.0, ans=0.0 2024-08-13 07:53:38,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3400, loss[loss=0.08053, beats_loss=0.01119, ecapa_loss=0.0002323, whisper_loss=0.06702, over 14334.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01083, ecapa_loss=0.0001661, whisper_loss=0.09196, over 3904251.23 frames. ], batch size: 63, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:53:38,473 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-13 07:53:40,228 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 07:53:47,782 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 07:54:03,023 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 07:54:08,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2063030.0, ans=0.125 2024-08-13 07:54:29,658 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 07:54:33,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2063230.0, ans=0.125 2024-08-13 07:54:35,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2063230.0, ans=0.125 2024-08-13 07:54:35,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2063230.0, ans=0.125 2024-08-13 07:54:53,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.342e+01 2.542e+01 2.769e+01 4.852e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-13 07:55:00,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3450, loss[loss=0.1088, beats_loss=0.009906, ecapa_loss=0.000206, whisper_loss=0.0968, over 20842.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001666, whisper_loss=0.09135, over 3905998.06 frames. ], batch size: 87, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:55:10,233 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 07:55:19,153 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 07:55:36,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2063630.0, ans=0.0 2024-08-13 07:56:04,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2063730.0, ans=0.0 2024-08-13 07:56:09,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2063830.0, ans=0.125 2024-08-13 07:56:22,710 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3500, loss[loss=0.07494, beats_loss=0.01291, ecapa_loss=0.0001591, whisper_loss=0.06045, over 17687.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001677, whisper_loss=0.09111, over 3869915.53 frames. ], batch size: 71, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:56:27,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2063930.0, ans=0.0 2024-08-13 07:56:40,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2064030.0, ans=0.04949747468305833 2024-08-13 07:56:42,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2064030.0, ans=0.125 2024-08-13 07:57:07,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2064130.0, ans=0.125 2024-08-13 07:57:11,235 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 27 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-13 07:57:11,532 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.188e-03 2024-08-13 07:57:17,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2064230.0, ans=0.125 2024-08-13 07:57:26,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2064230.0, ans=0.125 2024-08-13 07:57:36,543 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 07:57:37,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.517e+01 2.798e+01 3.148e+01 5.290e+01, threshold=5.596e+01, percent-clipped=1.0 2024-08-13 07:57:47,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3550, loss[loss=0.1215, beats_loss=0.008961, ecapa_loss=0.000179, whisper_loss=0.1108, over 19297.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01088, ecapa_loss=0.0001681, whisper_loss=0.09059, over 3886041.28 frames. ], batch size: 75, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:57:57,497 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 07:58:03,926 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 07:58:10,322 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 07:58:10,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2064530.0, ans=0.125 2024-08-13 07:59:11,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3600, loss[loss=0.103, beats_loss=0.009827, ecapa_loss=0.0001775, whisper_loss=0.09143, over 17060.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01094, ecapa_loss=0.0001675, whisper_loss=0.08989, over 3859368.49 frames. ], batch size: 69, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:59:15,511 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 07:59:23,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2064930.0, ans=0.0 2024-08-13 07:59:26,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2065030.0, ans=0.0 2024-08-13 07:59:46,329 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 07:59:49,621 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 08:00:27,253 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.371e+01 2.703e+01 3.040e+01 5.839e+01, threshold=5.406e+01, percent-clipped=1.0 2024-08-13 08:00:36,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3650, loss[loss=0.09788, beats_loss=0.01147, ecapa_loss=0.0001751, whisper_loss=0.08465, over 20534.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01093, ecapa_loss=0.0001672, whisper_loss=0.0899, over 3818669.65 frames. ], batch size: 84, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:00:42,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2065430.0, ans=0.125 2024-08-13 08:00:53,359 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 08:00:54,716 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 08:01:09,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2065530.0, ans=0.125 2024-08-13 08:01:16,752 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 08:01:20,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2065630.0, ans=0.1 2024-08-13 08:01:27,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2065730.0, ans=0.05 2024-08-13 08:01:27,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2024-08-13 08:01:53,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2065830.0, ans=0.125 2024-08-13 08:02:01,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3700, loss[loss=0.1012, beats_loss=0.009648, ecapa_loss=0.0002131, whisper_loss=0.08946, over 22059.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01095, ecapa_loss=0.0001678, whisper_loss=0.08958, over 3819180.83 frames. ], batch size: 92, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:02:05,860 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 08:02:11,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2065930.0, ans=0.125 2024-08-13 08:02:16,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2066030.0, ans=0.0 2024-08-13 08:02:41,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2066130.0, ans=0.125 2024-08-13 08:02:56,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2024-08-13 08:02:57,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2024-08-13 08:03:04,442 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 08:03:07,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2066330.0, ans=10.0 2024-08-13 08:03:09,393 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 34 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 08:03:13,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.360e+01 2.624e+01 2.875e+01 4.532e+01, threshold=5.249e+01, percent-clipped=0.0 2024-08-13 08:03:20,780 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3750, loss[loss=0.1043, beats_loss=0.009382, ecapa_loss=0.0001669, whisper_loss=0.09322, over 16537.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01092, ecapa_loss=0.0001673, whisper_loss=0.09007, over 3845778.60 frames. ], batch size: 63, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:03:35,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2066430.0, ans=0.125 2024-08-13 08:03:36,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2066530.0, ans=0.125 2024-08-13 08:04:04,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2066630.0, ans=0.0 2024-08-13 08:04:16,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2066730.0, ans=0.2 2024-08-13 08:04:31,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.78 vs. limit=10.0 2024-08-13 08:04:37,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2066830.0, ans=0.125 2024-08-13 08:04:40,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2066830.0, ans=0.125 2024-08-13 08:04:43,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3800, loss[loss=0.09172, beats_loss=0.01294, ecapa_loss=0.0001569, whisper_loss=0.07721, over 20264.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01089, ecapa_loss=0.0001679, whisper_loss=0.09038, over 3840102.21 frames. ], batch size: 82, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:04:44,668 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:04:44,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2066930.0, ans=0.1 2024-08-13 08:05:13,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2067030.0, ans=0.125 2024-08-13 08:05:13,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2067030.0, ans=0.04949747468305833 2024-08-13 08:05:24,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2067130.0, ans=0.2 2024-08-13 08:05:29,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2067130.0, ans=0.125 2024-08-13 08:05:34,973 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 32 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 08:05:40,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.92 vs. limit=15.0 2024-08-13 08:05:45,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2067230.0, ans=0.125 2024-08-13 08:05:51,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2067330.0, ans=0.125 2024-08-13 08:05:55,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.450e+01 2.726e+01 3.001e+01 5.077e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-13 08:06:03,765 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3850, loss[loss=0.09782, beats_loss=0.01201, ecapa_loss=0.0001689, whisper_loss=0.08411, over 15717.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01093, ecapa_loss=0.0001666, whisper_loss=0.09006, over 3825246.41 frames. ], batch size: 62, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:06:04,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2067430.0, ans=0.125 2024-08-13 08:06:16,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-13 08:06:42,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2024-08-13 08:06:43,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2067630.0, ans=0.0 2024-08-13 08:06:44,818 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.175e-01 2024-08-13 08:06:52,357 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 08:06:57,814 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 08:07:03,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2067730.0, ans=0.015 2024-08-13 08:07:09,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2067730.0, ans=0.035 2024-08-13 08:07:24,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2067830.0, ans=0.125 2024-08-13 08:07:24,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2067830.0, ans=0.0 2024-08-13 08:07:26,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2067830.0, ans=0.0 2024-08-13 08:07:29,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3900, loss[loss=0.1016, beats_loss=0.0124, ecapa_loss=0.0001662, whisper_loss=0.0875, over 23133.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01094, ecapa_loss=0.0001668, whisper_loss=0.09057, over 3875449.91 frames. ], batch size: 95, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:07:34,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2067930.0, ans=0.125 2024-08-13 08:07:39,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2067930.0, ans=0.1 2024-08-13 08:07:45,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2068030.0, ans=0.1 2024-08-13 08:08:04,220 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 08:08:09,139 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 08:08:10,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2068130.0, ans=0.2 2024-08-13 08:08:13,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2068130.0, ans=0.0 2024-08-13 08:08:21,637 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 08:08:25,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=15.0 2024-08-13 08:08:37,698 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 08:08:43,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.513e+01 2.786e+01 3.251e+01 6.128e+01, threshold=5.571e+01, percent-clipped=2.0 2024-08-13 08:08:52,107 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 3950, loss[loss=0.09351, beats_loss=0.0108, ecapa_loss=0.0001971, whisper_loss=0.08074, over 21754.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001676, whisper_loss=0.09077, over 3874730.45 frames. ], batch size: 93, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:09:01,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2068430.0, ans=0.125 2024-08-13 08:09:05,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2068430.0, ans=0.125 2024-08-13 08:09:27,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2068630.0, ans=0.0 2024-08-13 08:09:32,145 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 08:09:39,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2068630.0, ans=0.0 2024-08-13 08:09:39,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-13 08:10:14,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2068830.0, ans=0.1 2024-08-13 08:10:20,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4000, loss[loss=0.0887, beats_loss=0.01134, ecapa_loss=0.0002004, whisper_loss=0.07536, over 20518.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001678, whisper_loss=0.09142, over 3894991.05 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:10:30,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2068930.0, ans=0.2 2024-08-13 08:10:42,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2069030.0, ans=0.0 2024-08-13 08:10:59,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2069130.0, ans=0.0 2024-08-13 08:11:12,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2069230.0, ans=0.2 2024-08-13 08:11:15,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2069230.0, ans=0.0 2024-08-13 08:11:33,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-08-13 08:11:34,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.360e+01 2.605e+01 2.925e+01 4.033e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 08:11:35,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2069330.0, ans=0.0 2024-08-13 08:11:43,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4050, loss[loss=0.1132, beats_loss=0.01085, ecapa_loss=0.0001173, whisper_loss=0.1012, over 19770.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001681, whisper_loss=0.09171, over 3868625.25 frames. ], batch size: 72, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:11:46,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=8.0 2024-08-13 08:12:12,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2069530.0, ans=0.125 2024-08-13 08:12:20,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-08-13 08:12:27,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2069630.0, ans=0.0 2024-08-13 08:12:31,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2069630.0, ans=0.0 2024-08-13 08:12:56,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2069830.0, ans=0.035 2024-08-13 08:12:57,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2069830.0, ans=0.0 2024-08-13 08:13:00,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2024-08-13 08:13:03,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2069830.0, ans=15.0 2024-08-13 08:13:09,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4100, loss[loss=0.1119, beats_loss=0.01022, ecapa_loss=0.0001401, whisper_loss=0.1003, over 20942.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001676, whisper_loss=0.0919, over 3890994.27 frames. ], batch size: 81, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:13:25,205 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 08:13:32,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-08-13 08:13:38,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2070030.0, ans=0.2 2024-08-13 08:13:54,118 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 08:14:01,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2070230.0, ans=0.025 2024-08-13 08:14:14,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2070230.0, ans=0.1 2024-08-13 08:14:19,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2070330.0, ans=0.0 2024-08-13 08:14:24,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.381e+01 2.702e+01 3.113e+01 4.589e+01, threshold=5.403e+01, percent-clipped=0.0 2024-08-13 08:14:27,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2070330.0, ans=0.125 2024-08-13 08:14:33,751 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4150, loss[loss=0.105, beats_loss=0.01042, ecapa_loss=0.0001957, whisper_loss=0.09266, over 19225.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001678, whisper_loss=0.09182, over 3876087.07 frames. ], batch size: 80, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:14:35,852 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 08:14:46,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2070430.0, ans=0.125 2024-08-13 08:14:54,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2070530.0, ans=0.0 2024-08-13 08:15:14,290 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 08:15:15,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2024-08-13 08:15:17,599 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 08:15:26,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2070730.0, ans=0.95 2024-08-13 08:15:27,871 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 08:15:41,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2070830.0, ans=0.0 2024-08-13 08:15:56,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4200, loss[loss=0.1219, beats_loss=0.01189, ecapa_loss=0.0001606, whisper_loss=0.1084, over 22664.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001684, whisper_loss=0.09179, over 3873087.92 frames. ], batch size: 91, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:16:04,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2070930.0, ans=0.125 2024-08-13 08:16:16,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-13 08:16:19,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2071030.0, ans=0.125 2024-08-13 08:16:20,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2071030.0, ans=0.0 2024-08-13 08:16:22,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2071030.0, ans=0.125 2024-08-13 08:16:25,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-13 08:16:35,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2071130.0, ans=0.1 2024-08-13 08:17:08,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2071330.0, ans=0.125 2024-08-13 08:17:08,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2071330.0, ans=0.125 2024-08-13 08:17:10,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.330e+01 2.608e+01 3.052e+01 6.792e+01, threshold=5.217e+01, percent-clipped=3.0 2024-08-13 08:17:18,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4250, loss[loss=0.1053, beats_loss=0.01086, ecapa_loss=0.000135, whisper_loss=0.09305, over 15425.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001681, whisper_loss=0.09146, over 3882243.37 frames. ], batch size: 61, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:17:36,456 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 08:18:04,128 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 08:18:06,435 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 08:18:19,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2071730.0, ans=0.125 2024-08-13 08:18:40,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4300, loss[loss=0.101, beats_loss=0.01085, ecapa_loss=0.0002142, whisper_loss=0.08797, over 19321.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.000167, whisper_loss=0.09111, over 3912387.02 frames. ], batch size: 82, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:18:49,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2071930.0, ans=0.0 2024-08-13 08:18:54,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2071930.0, ans=0.1 2024-08-13 08:18:56,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2072030.0, ans=0.0 2024-08-13 08:18:59,773 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 8 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 08:19:18,725 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 08:19:24,561 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 08:19:26,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=22.5 2024-08-13 08:19:27,485 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 08:19:51,935 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-13 08:19:53,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.714e+01 2.965e+01 4.296e+01, threshold=5.429e+01, percent-clipped=0.0 2024-08-13 08:19:56,629 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 36 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 08:20:00,783 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4350, loss[loss=0.105, beats_loss=0.01137, ecapa_loss=0.0001625, whisper_loss=0.09202, over 21848.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.000166, whisper_loss=0.09161, over 3880057.92 frames. ], batch size: 90, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:20:20,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2024-08-13 08:20:31,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2072630.0, ans=0.125 2024-08-13 08:20:53,777 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 08:21:01,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-13 08:21:04,240 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-13 08:21:07,410 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 08:21:12,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2072830.0, ans=0.125 2024-08-13 08:21:23,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4400, loss[loss=0.1179, beats_loss=0.009692, ecapa_loss=0.0001863, whisper_loss=0.1063, over 17821.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001651, whisper_loss=0.09115, over 3874346.88 frames. ], batch size: 71, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:21:25,507 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 08:22:09,103 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 08:22:33,061 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 08:22:39,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2073330.0, ans=0.05 2024-08-13 08:22:40,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.445e+01 2.799e+01 3.126e+01 5.864e+01, threshold=5.599e+01, percent-clipped=1.0 2024-08-13 08:22:40,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2073330.0, ans=0.125 2024-08-13 08:22:47,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4450, loss[loss=0.1102, beats_loss=0.01096, ecapa_loss=0.0001696, whisper_loss=0.09757, over 22778.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001654, whisper_loss=0.09134, over 3903263.52 frames. ], batch size: 94, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:22:47,790 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 08:23:14,296 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 17 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-13 08:23:31,741 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 08:23:35,054 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 08:23:38,711 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:23:58,962 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 08:24:07,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4500, loss[loss=0.1022, beats_loss=0.009534, ecapa_loss=0.0001989, whisper_loss=0.09071, over 17361.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001657, whisper_loss=0.09099, over 3890426.03 frames. ], batch size: 72, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:24:15,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-13 08:24:19,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-13 08:24:22,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2074030.0, ans=0.125 2024-08-13 08:24:59,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2074230.0, ans=0.035 2024-08-13 08:25:09,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2074330.0, ans=0.125 2024-08-13 08:25:13,478 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-13 08:25:15,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.370e+01 2.670e+01 3.024e+01 4.135e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-13 08:25:23,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4550, loss[loss=0.07993, beats_loss=0.01145, ecapa_loss=0.0001688, whisper_loss=0.06679, over 14331.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.0001663, whisper_loss=0.09118, over 3887311.73 frames. ], batch size: 57, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:25:34,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2074430.0, ans=0.125 2024-08-13 08:25:35,815 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 08:25:44,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2074530.0, ans=0.125 2024-08-13 08:25:51,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=12.0 2024-08-13 08:26:01,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2074630.0, ans=0.125 2024-08-13 08:26:21,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2024-08-13 08:26:21,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2074830.0, ans=0.125 2024-08-13 08:26:26,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2074830.0, ans=0.125 2024-08-13 08:26:34,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4600, loss[loss=0.09529, beats_loss=0.011, ecapa_loss=0.0001316, whisper_loss=0.08298, over 20770.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01083, ecapa_loss=0.0001666, whisper_loss=0.09095, over 3917013.24 frames. ], batch size: 79, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:26:45,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2074930.0, ans=0.125 2024-08-13 08:27:00,656 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.151e-02 2024-08-13 08:27:04,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2075130.0, ans=0.0 2024-08-13 08:27:25,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2075230.0, ans=0.125 2024-08-13 08:27:28,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2075230.0, ans=0.025 2024-08-13 08:27:38,373 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 08:27:42,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.369e+01 2.617e+01 2.923e+01 4.349e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 08:27:43,873 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 08:27:45,173 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 08:27:49,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4650, loss[loss=0.1185, beats_loss=0.01129, ecapa_loss=0.0001483, whisper_loss=0.1057, over 21864.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001674, whisper_loss=0.09128, over 3909386.30 frames. ], batch size: 88, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:28:03,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-08-13 08:28:13,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-13 08:28:18,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-13 08:28:35,318 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 08:28:37,944 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 08:28:38,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2075730.0, ans=0.1 2024-08-13 08:29:04,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4700, loss[loss=0.0974, beats_loss=0.009413, ecapa_loss=0.0001903, whisper_loss=0.08609, over 20485.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01075, ecapa_loss=0.0001663, whisper_loss=0.09231, over 3914441.26 frames. ], batch size: 84, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:29:10,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2075930.0, ans=0.05 2024-08-13 08:29:39,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2076130.0, ans=0.125 2024-08-13 08:29:44,417 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-13 08:29:44,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2076130.0, ans=0.0 2024-08-13 08:29:45,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-13 08:30:13,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.493e+01 2.764e+01 3.080e+01 1.960e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-13 08:30:15,814 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-13 08:30:20,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4750, loss[loss=0.1317, beats_loss=0.00909, ecapa_loss=0.0002205, whisper_loss=0.1204, over 14376.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01079, ecapa_loss=0.0001652, whisper_loss=0.09232, over 3915409.18 frames. ], batch size: 55, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:30:28,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2076430.0, ans=0.0 2024-08-13 08:30:38,379 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 08:31:03,453 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 08:31:05,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-08-13 08:31:13,461 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 08:31:21,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.26 vs. limit=22.5 2024-08-13 08:31:34,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4800, loss[loss=0.1171, beats_loss=0.01034, ecapa_loss=0.0001626, whisper_loss=0.1051, over 21180.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.0001668, whisper_loss=0.09212, over 3936104.10 frames. ], batch size: 83, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:31:39,633 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 08:31:58,044 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 08:32:02,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2077030.0, ans=0.09899494936611666 2024-08-13 08:32:05,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2077130.0, ans=0.125 2024-08-13 08:32:07,788 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 08:32:09,484 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 08:32:20,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2077230.0, ans=0.125 2024-08-13 08:32:42,232 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.414e+01 2.705e+01 2.995e+01 6.816e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 08:32:49,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-08-13 08:32:49,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4850, loss[loss=0.1073, beats_loss=0.008921, ecapa_loss=0.000174, whisper_loss=0.09667, over 20455.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001662, whisper_loss=0.09177, over 3948132.86 frames. ], batch size: 82, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:33:01,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2077430.0, ans=0.1 2024-08-13 08:33:11,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2077530.0, ans=0.09899494936611666 2024-08-13 08:33:19,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2077630.0, ans=0.125 2024-08-13 08:33:45,010 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 08:33:54,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2077830.0, ans=0.2 2024-08-13 08:33:59,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2077830.0, ans=0.125 2024-08-13 08:33:59,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2077830.0, ans=0.2 2024-08-13 08:34:02,160 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4900, loss[loss=0.1233, beats_loss=0.008825, ecapa_loss=0.000182, whisper_loss=0.1126, over 22420.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001662, whisper_loss=0.09183, over 3906179.84 frames. ], batch size: 88, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:34:03,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2077930.0, ans=0.125 2024-08-13 08:34:06,381 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 08:34:07,679 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 08:34:14,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-08-13 08:34:29,075 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-13 08:34:42,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2078130.0, ans=0.125 2024-08-13 08:35:01,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2078330.0, ans=0.125 2024-08-13 08:35:06,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.464e+01 2.767e+01 3.041e+01 1.306e+02, threshold=5.534e+01, percent-clipped=2.0 2024-08-13 08:35:12,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 4950, loss[loss=0.09399, beats_loss=0.009578, ecapa_loss=0.0001524, whisper_loss=0.08288, over 16382.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001665, whisper_loss=0.09128, over 3893446.47 frames. ], batch size: 64, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:35:23,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-13 08:35:32,746 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 08:35:35,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2078530.0, ans=0.125 2024-08-13 08:35:47,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2078630.0, ans=0.125 2024-08-13 08:35:50,850 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 08:36:17,504 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-13 08:36:20,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2078830.0, ans=0.125 2024-08-13 08:36:20,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2078830.0, ans=0.0 2024-08-13 08:36:20,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2078830.0, ans=0.0 2024-08-13 08:36:22,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5000, loss[loss=0.09771, beats_loss=0.01165, ecapa_loss=0.0001495, whisper_loss=0.08456, over 19988.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001664, whisper_loss=0.09201, over 3889621.08 frames. ], batch size: 80, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:36:24,285 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 08:36:26,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-08-13 08:36:28,319 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 08:36:33,588 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 08:36:52,449 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 08:37:24,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.436e+01 2.730e+01 3.076e+01 4.220e+01, threshold=5.460e+01, percent-clipped=0.0 2024-08-13 08:37:24,292 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 08:37:30,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5050, loss[loss=0.1234, beats_loss=0.007797, ecapa_loss=0.0002272, whisper_loss=0.1134, over 19484.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001665, whisper_loss=0.09133, over 3886594.45 frames. ], batch size: 80, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:37:38,612 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 08:37:41,250 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 08:37:41,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2079430.0, ans=0.0 2024-08-13 08:37:45,229 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 08:37:48,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2079530.0, ans=0.125 2024-08-13 08:37:49,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2079530.0, ans=0.1 2024-08-13 08:38:03,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2079630.0, ans=0.09899494936611666 2024-08-13 08:38:15,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2079730.0, ans=0.125 2024-08-13 08:38:18,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-13 08:38:23,100 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 08:38:25,894 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 08:38:37,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5100, loss[loss=0.1091, beats_loss=0.01226, ecapa_loss=0.0001286, whisper_loss=0.09552, over 22615.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001652, whisper_loss=0.09143, over 3895221.46 frames. ], batch size: 89, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:38:42,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2079930.0, ans=0.1 2024-08-13 08:39:13,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2080130.0, ans=0.1 2024-08-13 08:39:24,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2080230.0, ans=0.125 2024-08-13 08:39:25,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2080230.0, ans=0.0 2024-08-13 08:39:31,042 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 08:39:41,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.313e+01 2.678e+01 2.870e+01 5.220e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-13 08:39:48,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5150, loss[loss=0.1182, beats_loss=0.00877, ecapa_loss=0.0001681, whisper_loss=0.1078, over 23204.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01089, ecapa_loss=0.0001653, whisper_loss=0.09246, over 3923018.33 frames. ], batch size: 87, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:39:48,633 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 08:39:53,094 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 08:39:57,229 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 08:40:23,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2080630.0, ans=0.0 2024-08-13 08:40:28,275 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 08:40:41,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2080730.0, ans=0.125 2024-08-13 08:40:57,286 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5200, loss[loss=0.1076, beats_loss=0.01096, ecapa_loss=0.0001272, whisper_loss=0.09532, over 18371.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01089, ecapa_loss=0.0001653, whisper_loss=0.09231, over 3910511.78 frames. ], batch size: 70, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:41:06,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2080930.0, ans=0.125 2024-08-13 08:41:09,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2080930.0, ans=0.125 2024-08-13 08:41:17,423 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 08:41:17,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2081030.0, ans=0.125 2024-08-13 08:41:30,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2081130.0, ans=0.125 2024-08-13 08:41:34,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2081130.0, ans=10.0 2024-08-13 08:41:43,906 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 08:41:45,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2081230.0, ans=0.0 2024-08-13 08:41:46,689 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 08:41:48,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2081230.0, ans=0.125 2024-08-13 08:41:59,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.338e+01 2.575e+01 2.873e+01 5.976e+01, threshold=5.150e+01, percent-clipped=1.0 2024-08-13 08:42:04,102 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 08:42:06,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5250, loss[loss=0.1175, beats_loss=0.01114, ecapa_loss=0.0001733, whisper_loss=0.1046, over 21993.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001655, whisper_loss=0.09143, over 3898230.82 frames. ], batch size: 89, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:42:09,302 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 08:42:11,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2081430.0, ans=0.125 2024-08-13 08:42:20,021 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 08:42:32,593 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 08:42:37,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2081630.0, ans=0.1 2024-08-13 08:42:43,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2081630.0, ans=0.125 2024-08-13 08:42:58,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2081730.0, ans=0.1 2024-08-13 08:43:02,093 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 08:43:05,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2081830.0, ans=0.125 2024-08-13 08:43:14,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5300, loss[loss=0.07346, beats_loss=0.01174, ecapa_loss=0.0001715, whisper_loss=0.06001, over 15633.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001675, whisper_loss=0.09156, over 3905526.43 frames. ], batch size: 65, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:43:14,911 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 08:43:16,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2081930.0, ans=0.1 2024-08-13 08:43:19,080 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-13 08:43:24,635 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 08:43:25,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-13 08:43:26,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=15.0 2024-08-13 08:43:27,333 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 08:43:31,405 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 08:43:42,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2082130.0, ans=0.0 2024-08-13 08:43:59,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2082230.0, ans=0.125 2024-08-13 08:44:02,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2082230.0, ans=0.125 2024-08-13 08:44:03,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2082230.0, ans=0.2 2024-08-13 08:44:16,007 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 08:44:16,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2082330.0, ans=0.0 2024-08-13 08:44:17,209 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.493e+01 2.716e+01 3.005e+01 4.281e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 08:44:24,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5350, loss[loss=0.09006, beats_loss=0.01299, ecapa_loss=0.0001893, whisper_loss=0.07518, over 18590.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001673, whisper_loss=0.09151, over 3893120.24 frames. ], batch size: 78, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:44:25,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2082430.0, ans=10.0 2024-08-13 08:44:39,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2082530.0, ans=0.0 2024-08-13 08:44:40,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2082530.0, ans=0.2 2024-08-13 08:44:44,186 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 08:44:47,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=22.5 2024-08-13 08:45:01,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2024-08-13 08:45:05,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-13 08:45:22,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2082830.0, ans=0.0 2024-08-13 08:45:25,452 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 08:45:32,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5400, loss[loss=0.08602, beats_loss=0.01257, ecapa_loss=0.0001562, whisper_loss=0.07189, over 17583.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001651, whisper_loss=0.09135, over 3900185.70 frames. ], batch size: 73, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:45:41,892 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 08:45:44,621 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 08:45:46,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2083030.0, ans=0.125 2024-08-13 08:45:55,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2083030.0, ans=0.125 2024-08-13 08:46:09,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=12.0 2024-08-13 08:46:13,218 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 08:46:24,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2083230.0, ans=0.0 2024-08-13 08:46:34,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.479e+01 2.684e+01 3.109e+01 1.549e+02, threshold=5.369e+01, percent-clipped=2.0 2024-08-13 08:46:40,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2083430.0, ans=0.125 2024-08-13 08:46:40,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5450, loss[loss=0.1082, beats_loss=0.00959, ecapa_loss=0.0001775, whisper_loss=0.09679, over 20383.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01067, ecapa_loss=0.0001648, whisper_loss=0.09217, over 3911982.50 frames. ], batch size: 81, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:46:42,572 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 08:46:48,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2083430.0, ans=0.05 2024-08-13 08:47:15,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2083630.0, ans=0.1 2024-08-13 08:47:43,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.81 vs. limit=10.0 2024-08-13 08:47:59,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5500, loss[loss=0.1053, beats_loss=0.00926, ecapa_loss=0.0001564, whisper_loss=0.09449, over 18877.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0107, ecapa_loss=0.0001656, whisper_loss=0.09196, over 3910708.64 frames. ], batch size: 72, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:48:03,068 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 08:48:03,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2083930.0, ans=0.125 2024-08-13 08:48:05,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=15.0 2024-08-13 08:48:58,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2084230.0, ans=0.2 2024-08-13 08:49:06,464 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 08:49:11,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2024-08-13 08:49:12,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2084330.0, ans=0.125 2024-08-13 08:49:16,877 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.483e+01 2.752e+01 3.066e+01 5.816e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-13 08:49:26,438 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5550, loss[loss=0.1073, beats_loss=0.01061, ecapa_loss=0.0001764, whisper_loss=0.09496, over 22987.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01071, ecapa_loss=0.0001648, whisper_loss=0.09223, over 3922604.77 frames. ], batch size: 93, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:49:54,438 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 08:49:54,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2084530.0, ans=0.125 2024-08-13 08:50:12,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2084630.0, ans=0.125 2024-08-13 08:50:59,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5600, loss[loss=0.102, beats_loss=0.0112, ecapa_loss=0.0001744, whisper_loss=0.08902, over 21482.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01073, ecapa_loss=0.0001643, whisper_loss=0.09236, over 3916089.97 frames. ], batch size: 93, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:51:17,574 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-13 08:51:22,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2085030.0, ans=0.0 2024-08-13 08:51:28,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2085030.0, ans=0.125 2024-08-13 08:51:35,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2085030.0, ans=0.0 2024-08-13 08:51:44,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2085130.0, ans=0.125 2024-08-13 08:51:44,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2085130.0, ans=0.125 2024-08-13 08:51:48,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2085130.0, ans=0.0 2024-08-13 08:52:33,300 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.389e+01 2.717e+01 3.076e+01 5.909e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 08:52:38,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2024-08-13 08:52:43,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5650, loss[loss=0.08657, beats_loss=0.01262, ecapa_loss=0.0001633, whisper_loss=0.07232, over 18164.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01073, ecapa_loss=0.000164, whisper_loss=0.09251, over 3927667.66 frames. ], batch size: 75, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:52:48,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-13 08:52:54,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2085430.0, ans=0.125 2024-08-13 08:54:03,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2024-08-13 08:54:17,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5700, loss[loss=0.1057, beats_loss=0.01034, ecapa_loss=0.0001745, whisper_loss=0.09357, over 19845.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01082, ecapa_loss=0.000164, whisper_loss=0.0921, over 3947204.38 frames. ], batch size: 81, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:54:22,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2085930.0, ans=0.0 2024-08-13 08:54:23,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2085930.0, ans=0.125 2024-08-13 08:54:23,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2085930.0, ans=0.125 2024-08-13 08:54:29,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2085930.0, ans=0.125 2024-08-13 08:54:33,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2086030.0, ans=0.125 2024-08-13 08:54:38,944 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 08:54:44,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=23.06 vs. limit=15.0 2024-08-13 08:54:49,062 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 08:54:54,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2024-08-13 08:55:25,047 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.405e+01 2.655e+01 3.007e+01 4.478e+01, threshold=5.310e+01, percent-clipped=0.0 2024-08-13 08:55:30,624 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 08:55:33,831 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5750, loss[loss=0.09224, beats_loss=0.01149, ecapa_loss=0.0001188, whisper_loss=0.07955, over 20740.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.0001627, whisper_loss=0.09207, over 3944685.21 frames. ], batch size: 79, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:56:12,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2086630.0, ans=0.0 2024-08-13 08:56:27,984 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 08:56:31,159 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 08:56:32,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-08-13 08:56:33,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2086730.0, ans=0.0 2024-08-13 08:56:35,703 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 08:56:41,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2086830.0, ans=0.2 2024-08-13 08:56:51,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5800, loss[loss=0.1019, beats_loss=0.01264, ecapa_loss=0.0001383, whisper_loss=0.08782, over 19492.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001633, whisper_loss=0.09199, over 3918624.45 frames. ], batch size: 75, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:57:23,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2087130.0, ans=0.125 2024-08-13 08:57:32,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2087130.0, ans=0.0 2024-08-13 08:57:35,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2024-08-13 08:57:48,495 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 08:58:02,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.414e+01 2.686e+01 3.038e+01 9.495e+01, threshold=5.372e+01, percent-clipped=3.0 2024-08-13 08:58:04,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2087330.0, ans=0.125 2024-08-13 08:58:08,837 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 08:58:09,852 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5850, loss[loss=0.1039, beats_loss=0.01059, ecapa_loss=0.0001663, whisper_loss=0.09163, over 21813.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01094, ecapa_loss=0.0001631, whisper_loss=0.09103, over 3920113.04 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:58:15,894 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 08:58:31,935 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-13 08:58:40,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2087630.0, ans=0.1 2024-08-13 08:59:05,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2087730.0, ans=0.04949747468305833 2024-08-13 08:59:16,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2087830.0, ans=0.0 2024-08-13 08:59:25,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2087830.0, ans=0.125 2024-08-13 08:59:27,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2087930.0, ans=0.0 2024-08-13 08:59:28,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5900, loss[loss=0.09012, beats_loss=0.01024, ecapa_loss=0.0001992, whisper_loss=0.07789, over 20005.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0109, ecapa_loss=0.0001643, whisper_loss=0.09057, over 3894553.68 frames. ], batch size: 81, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:59:34,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2024-08-13 08:59:35,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2087930.0, ans=0.125 2024-08-13 08:59:45,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2088030.0, ans=0.0 2024-08-13 08:59:46,459 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 08:59:48,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2024-08-13 08:59:59,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2088130.0, ans=0.0 2024-08-13 09:00:02,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-13 09:00:07,589 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.735e+01 2024-08-13 09:00:11,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2088130.0, ans=0.0 2024-08-13 09:00:21,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2088230.0, ans=0.125 2024-08-13 09:00:21,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2088230.0, ans=0.1 2024-08-13 09:00:27,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2088230.0, ans=0.125 2024-08-13 09:00:27,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-13 09:00:39,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.419e+01 2.634e+01 3.004e+01 5.084e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-13 09:00:43,268 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 09:00:47,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 5950, loss[loss=0.1096, beats_loss=0.01141, ecapa_loss=0.0001373, whisper_loss=0.09679, over 19246.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01094, ecapa_loss=0.0001643, whisper_loss=0.0904, over 3914701.75 frames. ], batch size: 74, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:01:02,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2088530.0, ans=0.125 2024-08-13 09:01:21,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2088630.0, ans=0.0 2024-08-13 09:01:41,983 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 09:01:46,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.96 vs. limit=22.5 2024-08-13 09:01:48,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2088730.0, ans=0.015 2024-08-13 09:01:52,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-08-13 09:02:07,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6000, loss[loss=0.1149, beats_loss=0.009913, ecapa_loss=0.0001415, whisper_loss=0.1036, over 21317.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001645, whisper_loss=0.09135, over 3912189.33 frames. ], batch size: 78, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:02:07,001 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 09:02:46,569 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005583, whisper_loss=0.2489, over 922467.00 frames. 2024-08-13 09:03:03,951 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on SV_voxceleb1: loss=0.004508, beats_loss=0, ecapa_loss=0.0004508, whisper_loss=0, over 939242.00 frames. 2024-08-13 09:04:10,067 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7667, 1.9849, 2.1922, 1.7561, 1.2461, 2.1296, 2.8212, 1.4735], device='cuda:3') 2024-08-13 09:04:53,308 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8965, 3.4101, 3.5595, 3.4879], device='cuda:3') 2024-08-13 09:05:03,053 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 09:05:03,057 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 09:05:04,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2088930.0, ans=0.1 2024-08-13 09:05:38,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2089130.0, ans=0.0 2024-08-13 09:05:44,178 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 09:06:05,173 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 09:06:06,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2089330.0, ans=0.0 2024-08-13 09:06:12,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.429e+01 2.733e+01 3.006e+01 6.424e+01, threshold=5.466e+01, percent-clipped=1.0 2024-08-13 09:06:12,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2089330.0, ans=0.07 2024-08-13 09:06:19,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6050, loss[loss=0.06924, beats_loss=0.01549, ecapa_loss=0.000154, whisper_loss=0.05221, over 16880.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.000165, whisper_loss=0.09141, over 3911094.76 frames. ], batch size: 72, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:06:22,417 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-13 09:06:23,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2089430.0, ans=0.125 2024-08-13 09:06:23,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2089430.0, ans=0.125 2024-08-13 09:06:30,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2089430.0, ans=0.1 2024-08-13 09:06:37,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2089530.0, ans=0.2 2024-08-13 09:06:38,932 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 09:06:51,906 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 09:06:54,517 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 09:07:06,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2089630.0, ans=0.0 2024-08-13 09:07:09,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2089730.0, ans=0.125 2024-08-13 09:07:11,596 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-13 09:07:23,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-13 09:07:27,311 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 38 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 09:07:34,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2089830.0, ans=0.0 2024-08-13 09:07:41,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6100, loss[loss=0.0988, beats_loss=0.01066, ecapa_loss=0.0001871, whisper_loss=0.08627, over 17035.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001652, whisper_loss=0.09109, over 3917891.38 frames. ], batch size: 69, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:08:07,510 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 09:08:20,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2090130.0, ans=0.125 2024-08-13 09:08:38,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=12.0 2024-08-13 09:08:51,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2090330.0, ans=0.125 2024-08-13 09:08:54,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2090330.0, ans=0.09899494936611666 2024-08-13 09:08:55,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.306e+01 2.537e+01 2.839e+01 1.271e+02, threshold=5.074e+01, percent-clipped=1.0 2024-08-13 09:09:03,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6150, loss[loss=0.1088, beats_loss=0.009153, ecapa_loss=0.0001842, whisper_loss=0.09783, over 22444.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001662, whisper_loss=0.09138, over 3925343.86 frames. ], batch size: 92, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:09:03,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2090430.0, ans=0.1 2024-08-13 09:09:45,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2090630.0, ans=0.1 2024-08-13 09:09:47,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2090630.0, ans=0.0 2024-08-13 09:09:50,194 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 09:10:05,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2024-08-13 09:10:12,750 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 09:10:14,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2090830.0, ans=0.0 2024-08-13 09:10:23,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6200, loss[loss=0.1148, beats_loss=0.009463, ecapa_loss=0.0001319, whisper_loss=0.104, over 16946.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001652, whisper_loss=0.09068, over 3880514.55 frames. ], batch size: 64, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:10:25,872 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 09:10:32,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2090930.0, ans=0.5 2024-08-13 09:10:51,073 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 09:10:52,448 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 42 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 09:10:55,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-13 09:10:59,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2091130.0, ans=0.1 2024-08-13 09:11:00,846 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 09:11:09,157 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 09:11:11,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2091130.0, ans=0.1 2024-08-13 09:11:17,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2091230.0, ans=0.125 2024-08-13 09:11:17,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2091230.0, ans=0.125 2024-08-13 09:11:33,865 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 09:11:37,976 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.446e+01 2.761e+01 3.049e+01 5.001e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-13 09:11:45,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6250, loss[loss=0.09033, beats_loss=0.01206, ecapa_loss=0.0001553, whisper_loss=0.07672, over 16579.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001653, whisper_loss=0.09076, over 3899997.17 frames. ], batch size: 66, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:11:54,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2091430.0, ans=0.0 2024-08-13 09:12:18,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2091630.0, ans=0.04949747468305833 2024-08-13 09:12:26,296 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 09:12:31,665 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 09:12:43,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2091730.0, ans=0.125 2024-08-13 09:12:45,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2091730.0, ans=0.0 2024-08-13 09:13:01,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2091830.0, ans=0.0 2024-08-13 09:13:05,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6300, loss[loss=0.09713, beats_loss=0.01323, ecapa_loss=0.0001899, whisper_loss=0.082, over 20876.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01086, ecapa_loss=0.0001657, whisper_loss=0.09174, over 3893926.87 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:13:20,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2092030.0, ans=0.1 2024-08-13 09:13:24,557 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 09:13:30,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2024-08-13 09:13:43,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2092130.0, ans=0.1 2024-08-13 09:14:00,113 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 09:14:14,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2092330.0, ans=0.95 2024-08-13 09:14:16,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.468e+01 2.785e+01 3.208e+01 1.167e+02, threshold=5.571e+01, percent-clipped=1.0 2024-08-13 09:14:19,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2092330.0, ans=0.0 2024-08-13 09:14:24,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6350, loss[loss=0.09966, beats_loss=0.01028, ecapa_loss=0.0001763, whisper_loss=0.08762, over 18650.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.000166, whisper_loss=0.09235, over 3879587.16 frames. ], batch size: 78, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:14:27,814 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 09:14:30,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2092430.0, ans=0.125 2024-08-13 09:14:30,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2092430.0, ans=0.0 2024-08-13 09:14:33,944 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 09:14:36,543 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 09:14:45,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2092530.0, ans=0.125 2024-08-13 09:14:49,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2024-08-13 09:14:58,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2092630.0, ans=0.125 2024-08-13 09:15:25,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2092830.0, ans=0.125 2024-08-13 09:15:35,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6400, loss[loss=0.09896, beats_loss=0.01113, ecapa_loss=0.0002066, whisper_loss=0.08577, over 20615.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01074, ecapa_loss=0.0001669, whisper_loss=0.09298, over 3870169.24 frames. ], batch size: 88, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:15:42,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2092930.0, ans=0.125 2024-08-13 09:15:53,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2093030.0, ans=0.0 2024-08-13 09:16:13,300 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 09:16:20,980 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 09:16:22,467 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 09:16:25,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2093230.0, ans=0.125 2024-08-13 09:16:32,098 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 09:16:34,590 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.483e+01 2.753e+01 3.245e+01 5.103e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-13 09:16:41,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6450, loss[loss=0.09736, beats_loss=0.01294, ecapa_loss=0.0001358, whisper_loss=0.08307, over 15694.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001669, whisper_loss=0.09253, over 3892341.42 frames. ], batch size: 58, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:16:41,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2093430.0, ans=0.125 2024-08-13 09:16:52,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2093430.0, ans=0.0 2024-08-13 09:17:01,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2093530.0, ans=0.125 2024-08-13 09:17:04,857 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 09:17:25,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-13 09:17:26,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2093730.0, ans=0.125 2024-08-13 09:17:32,864 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 09:17:37,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2093830.0, ans=0.0 2024-08-13 09:17:44,305 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 09:17:46,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6500, loss[loss=0.07186, beats_loss=0.01345, ecapa_loss=0.0001302, whisper_loss=0.05711, over 15433.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01083, ecapa_loss=0.0001647, whisper_loss=0.09312, over 3905216.61 frames. ], batch size: 62, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:18:01,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2094030.0, ans=0.0 2024-08-13 09:18:05,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2024-08-13 09:18:07,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2094030.0, ans=0.125 2024-08-13 09:18:21,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2094130.0, ans=0.0 2024-08-13 09:18:26,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2094230.0, ans=0.125 2024-08-13 09:18:43,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-13 09:18:43,756 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 09:18:46,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.544e+01 2.898e+01 3.309e+01 5.602e+01, threshold=5.795e+01, percent-clipped=1.0 2024-08-13 09:18:52,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6550, loss[loss=0.1047, beats_loss=0.00838, ecapa_loss=0.0001976, whisper_loss=0.09438, over 20266.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01082, ecapa_loss=0.0001636, whisper_loss=0.09314, over 3880020.52 frames. ], batch size: 80, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:19:10,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2094530.0, ans=0.0 2024-08-13 09:19:10,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-08-13 09:19:12,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2094530.0, ans=0.125 2024-08-13 09:19:21,769 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 09:19:29,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2094630.0, ans=0.125 2024-08-13 09:19:36,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-08-13 09:19:41,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2094730.0, ans=0.125 2024-08-13 09:19:41,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2024-08-13 09:19:49,640 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 09:19:52,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2094830.0, ans=0.0 2024-08-13 09:19:52,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2094830.0, ans=0.125 2024-08-13 09:19:57,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6600, loss[loss=0.1145, beats_loss=0.0081, ecapa_loss=0.0001609, whisper_loss=0.1048, over 20853.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01072, ecapa_loss=0.0001638, whisper_loss=0.09337, over 3906357.46 frames. ], batch size: 81, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:19:58,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2094930.0, ans=0.1 2024-08-13 09:20:15,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=15.0 2024-08-13 09:20:15,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-13 09:20:26,973 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-13 09:20:35,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2095130.0, ans=0.04949747468305833 2024-08-13 09:20:39,946 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 39 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 09:20:44,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2024-08-13 09:20:56,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.422e+01 2.623e+01 3.004e+01 7.541e+01, threshold=5.247e+01, percent-clipped=2.0 2024-08-13 09:21:01,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2095330.0, ans=0.09899494936611666 2024-08-13 09:21:03,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6650, loss[loss=0.09746, beats_loss=0.01254, ecapa_loss=0.0002029, whisper_loss=0.08288, over 20972.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01086, ecapa_loss=0.0001636, whisper_loss=0.09287, over 3927608.32 frames. ], batch size: 88, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:21:10,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2095430.0, ans=0.1 2024-08-13 09:21:44,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2095730.0, ans=0.2 2024-08-13 09:21:44,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2095730.0, ans=0.1 2024-08-13 09:21:54,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2095730.0, ans=0.0 2024-08-13 09:22:01,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2024-08-13 09:22:03,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2095830.0, ans=0.125 2024-08-13 09:22:09,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6700, loss[loss=0.0636, beats_loss=0.01126, ecapa_loss=0.0001573, whisper_loss=0.05077, over 13214.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.0001645, whisper_loss=0.0917, over 3891683.69 frames. ], batch size: 54, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:22:21,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2096030.0, ans=10.0 2024-08-13 09:22:22,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.48 vs. limit=10.0 2024-08-13 09:22:42,343 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 09:22:44,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2096130.0, ans=0.09899494936611666 2024-08-13 09:22:51,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2096230.0, ans=0.125 2024-08-13 09:22:53,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-13 09:23:02,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2096330.0, ans=0.09899494936611666 2024-08-13 09:23:03,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2096330.0, ans=0.125 2024-08-13 09:23:07,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.450e+01 2.665e+01 3.008e+01 5.668e+01, threshold=5.331e+01, percent-clipped=2.0 2024-08-13 09:23:10,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2096330.0, ans=0.125 2024-08-13 09:23:10,904 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-13 09:23:14,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6750, loss[loss=0.1056, beats_loss=0.008345, ecapa_loss=0.000173, whisper_loss=0.09557, over 20249.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001654, whisper_loss=0.09176, over 3898556.20 frames. ], batch size: 76, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:23:27,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-13 09:23:29,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2096530.0, ans=0.125 2024-08-13 09:23:32,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2096530.0, ans=0.0 2024-08-13 09:23:52,093 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 33 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 09:24:00,967 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 09:24:16,698 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 09:24:20,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6800, loss[loss=0.0955, beats_loss=0.01198, ecapa_loss=0.0001597, whisper_loss=0.08192, over 22556.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01077, ecapa_loss=0.0001673, whisper_loss=0.09201, over 3905600.18 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:24:21,749 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 09:24:24,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2096930.0, ans=0.125 2024-08-13 09:24:42,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2097030.0, ans=0.125 2024-08-13 09:24:49,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-13 09:24:50,239 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 09:24:54,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2097130.0, ans=0.04949747468305833 2024-08-13 09:25:06,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2097230.0, ans=0.0 2024-08-13 09:25:09,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2097230.0, ans=0.05 2024-08-13 09:25:11,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2097230.0, ans=0.0 2024-08-13 09:25:17,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2024-08-13 09:25:20,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.429e+01 2.619e+01 3.014e+01 5.255e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-13 09:25:27,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6850, loss[loss=0.08522, beats_loss=0.01307, ecapa_loss=0.0001849, whisper_loss=0.07029, over 20066.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001677, whisper_loss=0.09221, over 3906918.04 frames. ], batch size: 89, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:25:29,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2097430.0, ans=0.1 2024-08-13 09:25:37,329 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 09:25:38,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2097430.0, ans=0.2 2024-08-13 09:25:42,215 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 09:25:43,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2097530.0, ans=0.125 2024-08-13 09:25:46,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2097530.0, ans=0.2 2024-08-13 09:25:49,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2097530.0, ans=0.5 2024-08-13 09:25:49,970 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 09:25:50,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2097530.0, ans=0.125 2024-08-13 09:26:07,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2097730.0, ans=0.0 2024-08-13 09:26:08,290 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 09:26:12,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2097730.0, ans=0.1 2024-08-13 09:26:12,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2097730.0, ans=0.1 2024-08-13 09:26:13,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2097730.0, ans=0.1 2024-08-13 09:26:21,429 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 09:26:29,227 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-13 09:26:33,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6900, loss[loss=0.1234, beats_loss=0.006027, ecapa_loss=0.0001947, whisper_loss=0.1155, over 16981.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0108, ecapa_loss=0.0001675, whisper_loss=0.09248, over 3912427.53 frames. ], batch size: 64, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:26:53,597 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 09:26:56,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-08-13 09:26:57,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2098030.0, ans=0.125 2024-08-13 09:27:12,239 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 09:27:13,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2098230.0, ans=0.5 2024-08-13 09:27:15,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-08-13 09:27:16,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2098230.0, ans=0.125 2024-08-13 09:27:20,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-08-13 09:27:32,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.455e+01 2.903e+01 3.270e+01 5.847e+01, threshold=5.807e+01, percent-clipped=1.0 2024-08-13 09:27:39,169 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 6950, loss[loss=0.11, beats_loss=0.01081, ecapa_loss=0.0001831, whisper_loss=0.09736, over 22254.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01083, ecapa_loss=0.0001668, whisper_loss=0.09288, over 3928068.69 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:27:39,323 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 09:27:55,263 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 09:28:02,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=12.0 2024-08-13 09:28:10,903 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-13 09:28:36,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2098830.0, ans=0.125 2024-08-13 09:28:38,153 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 09:28:39,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2098830.0, ans=0.0 2024-08-13 09:28:41,994 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 09:28:44,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7000, loss[loss=0.1029, beats_loss=0.00874, ecapa_loss=0.0001802, whisper_loss=0.09235, over 16043.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01079, ecapa_loss=0.0001677, whisper_loss=0.09279, over 3914616.24 frames. ], batch size: 60, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:28:46,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2098930.0, ans=0.025 2024-08-13 09:28:56,566 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 09:29:18,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2099130.0, ans=0.2 2024-08-13 09:29:34,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.51 vs. limit=15.0 2024-08-13 09:29:35,152 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 09:29:42,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.399e+01 2.678e+01 3.214e+01 5.831e+01, threshold=5.356e+01, percent-clipped=1.0 2024-08-13 09:29:47,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2099330.0, ans=0.0 2024-08-13 09:29:49,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7050, loss[loss=0.09393, beats_loss=0.01112, ecapa_loss=0.0002053, whisper_loss=0.08075, over 19624.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01082, ecapa_loss=0.0001675, whisper_loss=0.0923, over 3912455.19 frames. ], batch size: 83, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:29:58,075 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 09:30:12,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2099530.0, ans=10.0 2024-08-13 09:30:24,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2099630.0, ans=0.125 2024-08-13 09:30:53,859 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 09:30:55,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2024-08-13 09:30:59,034 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 09:31:00,377 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7100, loss[loss=0.1021, beats_loss=0.01166, ecapa_loss=0.0001235, whisper_loss=0.08923, over 20318.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.0001676, whisper_loss=0.09187, over 3877214.92 frames. ], batch size: 77, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:31:04,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2099930.0, ans=0.125 2024-08-13 09:31:23,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-13 09:31:59,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2100330.0, ans=0.1 2024-08-13 09:32:06,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2100330.0, ans=0.5 2024-08-13 09:32:08,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.488e+01 2.756e+01 3.074e+01 1.860e+02, threshold=5.512e+01, percent-clipped=2.0 2024-08-13 09:32:11,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2100330.0, ans=0.125 2024-08-13 09:32:14,899 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7150, loss[loss=0.09985, beats_loss=0.01294, ecapa_loss=0.0001578, whisper_loss=0.08533, over 23055.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01085, ecapa_loss=0.0001671, whisper_loss=0.09174, over 3879523.28 frames. ], batch size: 95, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:32:58,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2100630.0, ans=0.0 2024-08-13 09:32:59,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2100730.0, ans=0.0 2024-08-13 09:33:05,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2100730.0, ans=0.125 2024-08-13 09:33:21,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-13 09:33:25,653 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-13 09:33:29,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7200, loss[loss=0.08033, beats_loss=0.009535, ecapa_loss=0.0001732, whisper_loss=0.06906, over 15726.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001676, whisper_loss=0.09197, over 3927952.23 frames. ], batch size: 60, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:33:35,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2100930.0, ans=0.0 2024-08-13 09:33:45,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2101030.0, ans=0.0 2024-08-13 09:33:50,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2101030.0, ans=0.125 2024-08-13 09:33:55,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2101030.0, ans=0.0 2024-08-13 09:34:16,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2101230.0, ans=0.0 2024-08-13 09:34:20,682 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 09:34:20,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2101230.0, ans=0.1 2024-08-13 09:34:32,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2101330.0, ans=0.0 2024-08-13 09:34:38,280 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.408e+01 2.663e+01 2.960e+01 8.950e+01, threshold=5.327e+01, percent-clipped=1.0 2024-08-13 09:34:44,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7250, loss[loss=0.1159, beats_loss=0.01044, ecapa_loss=0.00016, whisper_loss=0.1038, over 19890.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0001663, whisper_loss=0.09184, over 3917351.51 frames. ], batch size: 78, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:34:44,521 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 09:34:46,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2101430.0, ans=0.125 2024-08-13 09:34:47,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2101430.0, ans=0.125 2024-08-13 09:35:01,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2101530.0, ans=0.125 2024-08-13 09:35:03,003 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 09:35:08,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2101530.0, ans=0.125 2024-08-13 09:35:10,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2101530.0, ans=0.0 2024-08-13 09:35:21,143 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 09:35:32,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2101730.0, ans=0.0 2024-08-13 09:35:44,807 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 09:35:49,048 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-13 09:35:53,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2101830.0, ans=0.035 2024-08-13 09:35:55,127 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 09:35:56,396 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 31 from Vox, 43 fro AS 2024-08-13 09:35:59,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7300, loss[loss=0.1204, beats_loss=0.01085, ecapa_loss=0.0001892, whisper_loss=0.1077, over 22986.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.000166, whisper_loss=0.09115, over 3920750.14 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:36:04,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2101930.0, ans=0.1 2024-08-13 09:36:10,762 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 09:36:15,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2102030.0, ans=0.125 2024-08-13 09:36:25,377 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 09:36:38,809 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 09:36:42,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2102130.0, ans=0.125 2024-08-13 09:36:50,861 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 09:36:53,487 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-13 09:37:04,368 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 09:37:08,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.467e+01 2.644e+01 2.965e+01 8.104e+01, threshold=5.287e+01, percent-clipped=3.0 2024-08-13 09:37:14,140 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7350, loss[loss=0.103, beats_loss=0.01089, ecapa_loss=0.0001852, whisper_loss=0.09027, over 18253.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0108, ecapa_loss=0.0001658, whisper_loss=0.09145, over 3885699.23 frames. ], batch size: 75, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:37:36,219 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 09:37:38,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2102530.0, ans=0.125 2024-08-13 09:37:46,033 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:37:47,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2102630.0, ans=0.0 2024-08-13 09:38:29,004 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7400, loss[loss=0.1307, beats_loss=0.00886, ecapa_loss=0.0001745, whisper_loss=0.1201, over 23427.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001664, whisper_loss=0.09183, over 3900315.06 frames. ], batch size: 90, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:38:37,147 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 09:38:39,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-13 09:38:45,711 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 09:38:51,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2103030.0, ans=0.0 2024-08-13 09:38:51,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2103030.0, ans=0.125 2024-08-13 09:38:56,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2024-08-13 09:39:03,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2103130.0, ans=0.125 2024-08-13 09:39:06,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-13 09:39:14,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2103230.0, ans=0.0 2024-08-13 09:39:30,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2103330.0, ans=0.0 2024-08-13 09:39:30,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.29 vs. limit=10.0 2024-08-13 09:39:37,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2103330.0, ans=0.2 2024-08-13 09:39:40,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.473e+01 2.699e+01 3.080e+01 4.653e+01, threshold=5.397e+01, percent-clipped=0.0 2024-08-13 09:39:46,018 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 19 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 09:39:47,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7450, loss[loss=0.08461, beats_loss=0.01351, ecapa_loss=0.0001214, whisper_loss=0.06989, over 22181.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001656, whisper_loss=0.09163, over 3920854.42 frames. ], batch size: 86, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:40:02,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2103530.0, ans=0.125 2024-08-13 09:40:10,505 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 09:40:15,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-13 09:40:18,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2103630.0, ans=0.07 2024-08-13 09:40:47,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2103830.0, ans=0.0 2024-08-13 09:40:48,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2103830.0, ans=0.1 2024-08-13 09:40:52,789 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 09:41:03,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7500, loss[loss=0.08557, beats_loss=0.01299, ecapa_loss=0.0001796, whisper_loss=0.07079, over 14280.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01074, ecapa_loss=0.0001667, whisper_loss=0.09235, over 3906546.59 frames. ], batch size: 60, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:41:12,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2024-08-13 09:41:26,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2104030.0, ans=0.0 2024-08-13 09:41:28,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2104030.0, ans=0.0 2024-08-13 09:41:30,420 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 09:41:44,908 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 09:42:01,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2104330.0, ans=0.0 2024-08-13 09:42:01,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-08-13 09:42:11,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.360e+01 2.624e+01 2.937e+01 1.240e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-13 09:42:11,520 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 09:42:15,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-08-13 09:42:15,763 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 09:42:17,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7550, loss[loss=0.113, beats_loss=0.009343, ecapa_loss=0.0001852, whisper_loss=0.1018, over 17356.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001664, whisper_loss=0.09136, over 3843163.37 frames. ], batch size: 69, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:42:37,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-08-13 09:42:38,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2104530.0, ans=0.0 2024-08-13 09:43:03,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2104730.0, ans=15.0 2024-08-13 09:43:13,094 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 09:43:23,671 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 09:43:25,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2104830.0, ans=0.125 2024-08-13 09:43:32,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7600, loss[loss=0.09747, beats_loss=0.01084, ecapa_loss=0.0001456, whisper_loss=0.08517, over 16567.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001667, whisper_loss=0.09057, over 3840070.82 frames. ], batch size: 65, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:43:44,375 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 09:44:15,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=15.0 2024-08-13 09:44:26,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2105230.0, ans=0.125 2024-08-13 09:44:41,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.428e+01 2.721e+01 3.053e+01 1.709e+02, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 09:44:46,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7650, loss[loss=0.1206, beats_loss=0.008551, ecapa_loss=0.0001941, whisper_loss=0.1101, over 14883.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001665, whisper_loss=0.09151, over 3871835.30 frames. ], batch size: 58, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:44:47,055 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 09:45:03,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2105530.0, ans=0.0 2024-08-13 09:45:21,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2105630.0, ans=0.125 2024-08-13 09:45:25,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2105630.0, ans=0.025 2024-08-13 09:45:32,579 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 09:45:50,163 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 09:46:02,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7700, loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001647, whisper_loss=0.09006, over 14190.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001647, whisper_loss=0.09103, over 3895902.36 frames. ], batch size: 55, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:46:11,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-13 09:46:14,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-13 09:46:17,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2024-08-13 09:46:45,675 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 19 from LS+wenet, 32 from Vox, 44 fro AS 2024-08-13 09:46:48,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2106230.0, ans=0.0 2024-08-13 09:46:52,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2106230.0, ans=0.0 2024-08-13 09:47:04,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2106330.0, ans=0.2 2024-08-13 09:47:05,253 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 09:47:12,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.458e+01 2.712e+01 3.112e+01 4.115e+01, threshold=5.423e+01, percent-clipped=0.0 2024-08-13 09:47:12,549 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 09:47:18,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7750, loss[loss=0.1068, beats_loss=0.01119, ecapa_loss=0.0001628, whisper_loss=0.09397, over 19801.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01086, ecapa_loss=0.0001646, whisper_loss=0.09114, over 3889109.70 frames. ], batch size: 80, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:47:20,177 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 09:47:24,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2106430.0, ans=0.125 2024-08-13 09:47:43,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2106530.0, ans=0.125 2024-08-13 09:47:56,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2106630.0, ans=0.1 2024-08-13 09:48:19,148 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 09:48:33,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=2106830.0, ans=22.5 2024-08-13 09:48:35,004 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7800, loss[loss=0.1203, beats_loss=0.008037, ecapa_loss=0.0002095, whisper_loss=0.1102, over 21663.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01071, ecapa_loss=0.0001649, whisper_loss=0.09234, over 3911427.68 frames. ], batch size: 88, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:49:34,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2107330.0, ans=0.0 2024-08-13 09:49:39,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2107330.0, ans=0.2 2024-08-13 09:49:39,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2107330.0, ans=0.0 2024-08-13 09:49:45,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.478e+01 2.776e+01 3.061e+01 6.531e+01, threshold=5.553e+01, percent-clipped=2.0 2024-08-13 09:49:51,057 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7850, loss[loss=0.1091, beats_loss=0.01241, ecapa_loss=0.0001498, whisper_loss=0.0952, over 23128.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01078, ecapa_loss=0.0001656, whisper_loss=0.0924, over 3917582.26 frames. ], batch size: 93, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:49:51,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2107430.0, ans=0.0 2024-08-13 09:50:04,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2107430.0, ans=0.2 2024-08-13 09:50:12,183 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-13 09:50:16,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.38 vs. limit=6.0 2024-08-13 09:50:32,180 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 09:50:35,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2107730.0, ans=0.0 2024-08-13 09:50:42,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2107730.0, ans=0.1 2024-08-13 09:50:58,072 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 09:51:00,857 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 09:51:08,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7900, loss[loss=0.09009, beats_loss=0.01264, ecapa_loss=0.0001348, whisper_loss=0.07611, over 23018.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001641, whisper_loss=0.09202, over 3941692.04 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:51:10,765 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.684e-02 2024-08-13 09:51:12,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2107930.0, ans=0.0 2024-08-13 09:51:22,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-13 09:51:27,727 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 09:51:34,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.64 vs. limit=22.5 2024-08-13 09:51:39,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2108130.0, ans=0.2 2024-08-13 09:52:19,113 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 09:52:19,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.346e+01 2.630e+01 3.151e+01 7.356e+01, threshold=5.260e+01, percent-clipped=1.0 2024-08-13 09:52:26,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 7950, loss[loss=0.1104, beats_loss=0.0107, ecapa_loss=0.0001842, whisper_loss=0.09784, over 23008.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001653, whisper_loss=0.09149, over 3948527.95 frames. ], batch size: 95, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:52:34,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2108430.0, ans=0.04949747468305833 2024-08-13 09:52:37,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2108430.0, ans=0.1 2024-08-13 09:52:44,883 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 09:52:50,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2108530.0, ans=0.125 2024-08-13 09:52:50,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-08-13 09:52:53,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2108530.0, ans=0.0 2024-08-13 09:52:57,025 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 09:53:01,552 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:53:28,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2108830.0, ans=0.2 2024-08-13 09:53:43,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2108830.0, ans=0.2 2024-08-13 09:53:45,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8000, loss[loss=0.09717, beats_loss=0.00903, ecapa_loss=0.0001707, whisper_loss=0.08644, over 22898.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.0001651, whisper_loss=0.09205, over 3945464.50 frames. ], batch size: 93, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:53:55,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2108930.0, ans=0.125 2024-08-13 09:54:29,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2109130.0, ans=0.0 2024-08-13 09:54:34,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2109230.0, ans=0.125 2024-08-13 09:54:35,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2109230.0, ans=0.125 2024-08-13 09:54:36,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-13 09:54:38,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2109230.0, ans=0.1 2024-08-13 09:54:42,551 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-13 09:54:52,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2109330.0, ans=0.125 2024-08-13 09:54:56,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.293e+01 2.578e+01 2.886e+01 4.471e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-13 09:55:02,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8050, loss[loss=0.09054, beats_loss=0.01373, ecapa_loss=0.000129, whisper_loss=0.07551, over 22861.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001647, whisper_loss=0.09185, over 3899041.55 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:56:04,754 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 09:56:20,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8100, loss[loss=0.09999, beats_loss=0.01301, ecapa_loss=0.0001477, whisper_loss=0.0855, over 23309.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001648, whisper_loss=0.09134, over 3895285.69 frames. ], batch size: 93, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:56:24,073 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 09:56:29,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-13 09:56:44,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2110030.0, ans=0.125 2024-08-13 09:57:10,354 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 09:57:10,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2110230.0, ans=0.125 2024-08-13 09:57:10,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2110230.0, ans=0.09899494936611666 2024-08-13 09:57:15,970 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 09:57:21,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2110330.0, ans=0.2 2024-08-13 09:57:30,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.445e+01 2.691e+01 3.022e+01 6.409e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 09:57:36,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8150, loss[loss=0.08388, beats_loss=0.01251, ecapa_loss=0.0001487, whisper_loss=0.06989, over 19919.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001646, whisper_loss=0.09138, over 3893641.87 frames. ], batch size: 84, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:57:38,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2110430.0, ans=0.2 2024-08-13 09:57:39,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=12.0 2024-08-13 09:58:15,714 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 09:58:27,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2110730.0, ans=0.2 2024-08-13 09:58:44,560 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 09:58:48,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2110830.0, ans=0.015 2024-08-13 09:58:54,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2024-08-13 09:58:54,395 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8200, loss[loss=0.08547, beats_loss=0.009183, ecapa_loss=0.0001997, whisper_loss=0.07429, over 17681.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001657, whisper_loss=0.0912, over 3912925.95 frames. ], batch size: 72, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:59:00,055 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-13 09:59:00,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-13 09:59:03,175 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 24 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-13 09:59:10,481 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 09:59:12,397 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-13 09:59:13,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2111030.0, ans=0.125 2024-08-13 09:59:24,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2111130.0, ans=0.0 2024-08-13 09:59:30,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2111130.0, ans=0.025 2024-08-13 09:59:36,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2111130.0, ans=0.0 2024-08-13 09:59:45,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2111230.0, ans=0.125 2024-08-13 09:59:55,842 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 10:00:02,625 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 10:00:08,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.520e+01 2.689e+01 2.972e+01 4.311e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-13 10:00:09,782 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 10:00:10,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-13 10:00:14,738 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8250, loss[loss=0.08823, beats_loss=0.01192, ecapa_loss=0.0001619, whisper_loss=0.0747, over 19075.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01094, ecapa_loss=0.0001656, whisper_loss=0.09029, over 3892281.82 frames. ], batch size: 74, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:00:20,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2111430.0, ans=0.025 2024-08-13 10:00:22,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2111430.0, ans=0.125 2024-08-13 10:00:53,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2111630.0, ans=0.0 2024-08-13 10:00:53,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2111630.0, ans=0.1 2024-08-13 10:01:03,254 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 10:01:06,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2111730.0, ans=0.125 2024-08-13 10:01:27,897 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 10:01:28,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2111830.0, ans=0.0 2024-08-13 10:01:35,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8300, loss[loss=0.1072, beats_loss=0.01005, ecapa_loss=0.0002039, whisper_loss=0.09513, over 16589.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.000166, whisper_loss=0.09103, over 3903624.66 frames. ], batch size: 69, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:01:42,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-13 10:02:02,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2112030.0, ans=0.07 2024-08-13 10:02:08,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2112130.0, ans=0.2 2024-08-13 10:02:14,488 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 10:02:37,443 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 10:02:46,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.390e+01 2.767e+01 3.084e+01 3.775e+01, threshold=5.535e+01, percent-clipped=0.0 2024-08-13 10:02:50,756 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:02:52,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8350, loss[loss=0.0878, beats_loss=0.009442, ecapa_loss=0.0001626, whisper_loss=0.07674, over 16203.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001648, whisper_loss=0.09075, over 3893518.92 frames. ], batch size: 64, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:03:11,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2112530.0, ans=0.0 2024-08-13 10:03:11,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2112530.0, ans=0.09899494936611666 2024-08-13 10:03:15,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-08-13 10:03:57,964 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 10:04:01,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2112830.0, ans=0.125 2024-08-13 10:04:10,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8400, loss[loss=0.1075, beats_loss=0.005815, ecapa_loss=0.0002256, whisper_loss=0.09943, over 14325.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.000165, whisper_loss=0.09088, over 3894958.70 frames. ], batch size: 55, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:04:14,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2112930.0, ans=0.125 2024-08-13 10:04:24,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2112930.0, ans=0.125 2024-08-13 10:04:26,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2113030.0, ans=0.125 2024-08-13 10:04:27,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2113030.0, ans=0.0 2024-08-13 10:04:51,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2113130.0, ans=0.0 2024-08-13 10:04:56,742 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 10:05:00,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2113230.0, ans=0.125 2024-08-13 10:05:03,807 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 10:05:09,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2113230.0, ans=0.2 2024-08-13 10:05:21,017 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-13 10:05:22,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.471e+01 2.703e+01 3.041e+01 5.042e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-13 10:05:25,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2113330.0, ans=0.0 2024-08-13 10:05:28,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8450, loss[loss=0.09541, beats_loss=0.01026, ecapa_loss=0.0001891, whisper_loss=0.08326, over 22206.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001648, whisper_loss=0.09113, over 3862033.32 frames. ], batch size: 94, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:05:42,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.87 vs. limit=10.0 2024-08-13 10:05:43,612 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.069e+00 2024-08-13 10:06:01,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2113630.0, ans=0.125 2024-08-13 10:06:05,084 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-13 10:06:08,801 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 19 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-13 10:06:17,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2113730.0, ans=0.125 2024-08-13 10:06:37,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2113830.0, ans=0.125 2024-08-13 10:06:48,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8500, loss[loss=0.09705, beats_loss=0.009412, ecapa_loss=0.0001822, whisper_loss=0.08581, over 18300.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001653, whisper_loss=0.09026, over 3850644.77 frames. ], batch size: 74, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:06:59,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2113930.0, ans=0.0 2024-08-13 10:07:04,713 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 10:07:06,350 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 10:07:25,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2114130.0, ans=0.0 2024-08-13 10:07:51,015 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 10:07:55,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2114330.0, ans=0.125 2024-08-13 10:08:04,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.378e+01 2.649e+01 2.972e+01 5.253e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-13 10:08:10,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8550, loss[loss=0.118, beats_loss=0.01155, ecapa_loss=0.0001698, whisper_loss=0.1048, over 22637.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001649, whisper_loss=0.09064, over 3834034.46 frames. ], batch size: 89, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:08:12,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-13 10:08:23,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2114430.0, ans=0.1 2024-08-13 10:08:26,174 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 10:08:29,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2114530.0, ans=0.125 2024-08-13 10:08:34,587 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 10:09:16,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=12.0 2024-08-13 10:09:31,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8600, loss[loss=0.1147, beats_loss=0.01105, ecapa_loss=0.0001509, whisper_loss=0.1021, over 20134.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001642, whisper_loss=0.09107, over 3855131.50 frames. ], batch size: 80, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:09:33,104 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 10:09:39,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2114930.0, ans=0.1 2024-08-13 10:09:42,473 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-13 10:09:46,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2114930.0, ans=0.2 2024-08-13 10:09:47,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2115030.0, ans=0.125 2024-08-13 10:10:01,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-08-13 10:10:03,249 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 10:10:15,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-13 10:10:16,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2115130.0, ans=0.125 2024-08-13 10:10:22,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.19 vs. limit=10.0 2024-08-13 10:10:37,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2115330.0, ans=0.0 2024-08-13 10:10:40,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-13 10:10:45,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.403e+01 2.760e+01 3.057e+01 6.734e+01, threshold=5.520e+01, percent-clipped=3.0 2024-08-13 10:10:48,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-08-13 10:10:49,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2115330.0, ans=0.125 2024-08-13 10:10:51,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8650, loss[loss=0.1308, beats_loss=0.007368, ecapa_loss=0.0001854, whisper_loss=0.1216, over 15114.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0109, ecapa_loss=0.0001643, whisper_loss=0.09109, over 3872528.88 frames. ], batch size: 60, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:10:53,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2115430.0, ans=0.0 2024-08-13 10:11:23,771 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:11:28,926 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 10:11:32,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2115630.0, ans=0.125 2024-08-13 10:11:38,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2115730.0, ans=10.0 2024-08-13 10:11:53,484 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-13 10:11:54,937 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 10:11:56,398 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 10:12:08,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8700, loss[loss=0.08017, beats_loss=0.01105, ecapa_loss=0.0001361, whisper_loss=0.06776, over 15247.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01095, ecapa_loss=0.000165, whisper_loss=0.08989, over 3870192.06 frames. ], batch size: 59, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:12:17,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2024-08-13 10:12:33,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2116030.0, ans=0.0 2024-08-13 10:12:39,060 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 10:12:44,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2116130.0, ans=0.1 2024-08-13 10:12:50,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-13 10:13:11,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2116230.0, ans=0.1 2024-08-13 10:13:15,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2116330.0, ans=0.125 2024-08-13 10:13:16,712 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 10:13:24,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.443e+01 2.656e+01 3.130e+01 5.733e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-13 10:13:30,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8750, loss[loss=0.1055, beats_loss=0.008533, ecapa_loss=0.0001809, whisper_loss=0.09511, over 16323.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01088, ecapa_loss=0.0001653, whisper_loss=0.09004, over 3862619.16 frames. ], batch size: 61, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:13:31,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2116430.0, ans=0.125 2024-08-13 10:13:36,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2116430.0, ans=0.07 2024-08-13 10:13:45,919 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 10:14:05,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2116630.0, ans=0.05 2024-08-13 10:14:09,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2116630.0, ans=0.1 2024-08-13 10:14:25,746 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 10:14:27,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2116730.0, ans=0.1 2024-08-13 10:14:49,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2116930.0, ans=0.125 2024-08-13 10:14:49,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8800, loss[loss=0.09433, beats_loss=0.0126, ecapa_loss=0.0001593, whisper_loss=0.08014, over 22516.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01094, ecapa_loss=0.0001646, whisper_loss=0.08977, over 3839465.59 frames. ], batch size: 91, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:14:56,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2116930.0, ans=0.125 2024-08-13 10:15:29,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-13 10:15:41,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2117230.0, ans=0.125 2024-08-13 10:16:03,482 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 10:16:06,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.410e+01 2.636e+01 2.976e+01 1.522e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 10:16:10,372 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 10:16:13,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8850, loss[loss=0.08266, beats_loss=0.01241, ecapa_loss=0.0001304, whisper_loss=0.06894, over 17894.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01094, ecapa_loss=0.0001659, whisper_loss=0.0894, over 3838549.79 frames. ], batch size: 71, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:16:16,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2024-08-13 10:16:20,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2117430.0, ans=0.2 2024-08-13 10:16:26,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2117430.0, ans=0.125 2024-08-13 10:16:38,446 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 10:16:38,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2117530.0, ans=0.125 2024-08-13 10:16:38,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2117530.0, ans=0.1 2024-08-13 10:16:44,304 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 10:17:08,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2117730.0, ans=0.125 2024-08-13 10:17:14,879 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 10:17:16,958 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 10:17:22,241 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 10:17:34,221 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8900, loss[loss=0.08715, beats_loss=0.01217, ecapa_loss=0.0001331, whisper_loss=0.07364, over 18602.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01089, ecapa_loss=0.0001656, whisper_loss=0.09024, over 3845785.35 frames. ], batch size: 72, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:17:36,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2117930.0, ans=0.1 2024-08-13 10:17:53,599 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 10:18:48,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.342e+01 2.664e+01 2.910e+01 6.216e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 10:18:54,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 8950, loss[loss=0.08406, beats_loss=0.01206, ecapa_loss=0.000156, whisper_loss=0.07044, over 19382.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01097, ecapa_loss=0.0001642, whisper_loss=0.09057, over 3892367.79 frames. ], batch size: 78, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:19:23,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2118530.0, ans=0.125 2024-08-13 10:19:40,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2118730.0, ans=0.025 2024-08-13 10:19:42,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.68 vs. limit=10.0 2024-08-13 10:20:13,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9000, loss[loss=0.1113, beats_loss=0.009231, ecapa_loss=0.0001562, whisper_loss=0.1005, over 16750.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01087, ecapa_loss=0.0001654, whisper_loss=0.09065, over 3881111.62 frames. ], batch size: 68, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:20:13,265 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 10:20:54,971 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 10:21:13,593 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on SV_voxceleb1: loss=0.004578, beats_loss=0, ecapa_loss=0.0004578, whisper_loss=0, over 939242.00 frames. 2024-08-13 10:23:02,670 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 10:23:02,674 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 10:23:04,129 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-13 10:23:10,661 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 10:23:48,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-13 10:24:01,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2119230.0, ans=0.0 2024-08-13 10:24:03,006 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 10:24:08,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2119330.0, ans=0.0 2024-08-13 10:24:18,376 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.388e+01 2.773e+01 3.157e+01 5.459e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-13 10:24:23,728 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 10:24:24,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9050, loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001888, whisper_loss=0.08937, over 18262.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01085, ecapa_loss=0.0001646, whisper_loss=0.09114, over 3912332.60 frames. ], batch size: 76, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:24:33,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2119430.0, ans=0.125 2024-08-13 10:24:46,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2119530.0, ans=0.07 2024-08-13 10:24:56,127 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 10:24:56,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2119630.0, ans=0.1 2024-08-13 10:25:22,982 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 10:25:25,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-13 10:25:44,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9100, loss[loss=0.09057, beats_loss=0.01349, ecapa_loss=0.000131, whisper_loss=0.07576, over 18458.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001654, whisper_loss=0.09124, over 3886723.85 frames. ], batch size: 74, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:26:03,077 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 24 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-13 10:26:10,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2120030.0, ans=0.0 2024-08-13 10:26:13,321 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 10:26:47,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2024-08-13 10:26:47,781 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 34 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 10:26:48,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2120230.0, ans=0.0 2024-08-13 10:26:51,136 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 21 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-13 10:27:00,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2120330.0, ans=0.0 2024-08-13 10:27:02,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.637e+01 2.940e+01 4.647e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 10:27:08,452 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 10:27:09,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2120430.0, ans=0.125 2024-08-13 10:27:10,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9150, loss[loss=0.08967, beats_loss=0.01223, ecapa_loss=0.0001415, whisper_loss=0.07603, over 17539.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001648, whisper_loss=0.09201, over 3894691.96 frames. ], batch size: 72, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:27:11,880 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 10:27:17,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.61 vs. limit=10.0 2024-08-13 10:27:25,580 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 10:27:41,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2120630.0, ans=0.125 2024-08-13 10:27:52,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2120630.0, ans=0.1 2024-08-13 10:28:11,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2120730.0, ans=0.2 2024-08-13 10:28:16,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2120830.0, ans=0.0 2024-08-13 10:28:29,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9200, loss[loss=0.1187, beats_loss=0.007411, ecapa_loss=0.0001939, whisper_loss=0.1094, over 13735.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001649, whisper_loss=0.09204, over 3917229.21 frames. ], batch size: 53, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:28:50,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2121030.0, ans=0.1 2024-08-13 10:29:00,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2121130.0, ans=0.2 2024-08-13 10:29:02,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2121130.0, ans=0.125 2024-08-13 10:29:03,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2121130.0, ans=0.125 2024-08-13 10:29:18,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2121230.0, ans=0.125 2024-08-13 10:29:20,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2121230.0, ans=0.2 2024-08-13 10:29:22,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2121230.0, ans=0.0 2024-08-13 10:29:25,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2121230.0, ans=0.125 2024-08-13 10:29:37,446 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 10:29:41,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.412e+01 2.586e+01 2.944e+01 1.076e+02, threshold=5.171e+01, percent-clipped=1.0 2024-08-13 10:29:44,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2121330.0, ans=0.95 2024-08-13 10:29:47,928 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 10:29:48,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9250, loss[loss=0.1198, beats_loss=0.009571, ecapa_loss=0.0001573, whisper_loss=0.1086, over 23467.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0109, ecapa_loss=0.0001646, whisper_loss=0.09092, over 3916170.94 frames. ], batch size: 92, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:29:53,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2024-08-13 10:30:14,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2121530.0, ans=15.0 2024-08-13 10:30:16,282 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 10:30:19,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-08-13 10:30:41,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-13 10:30:47,156 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 10:30:59,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2121830.0, ans=0.0 2024-08-13 10:31:13,833 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9300, loss[loss=0.09375, beats_loss=0.01262, ecapa_loss=0.0001249, whisper_loss=0.07988, over 18995.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001649, whisper_loss=0.09118, over 3905667.74 frames. ], batch size: 73, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:31:41,743 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 10:31:43,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2122030.0, ans=0.2 2024-08-13 10:31:51,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2122130.0, ans=0.125 2024-08-13 10:31:51,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2122130.0, ans=0.0 2024-08-13 10:31:52,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2024-08-13 10:31:57,524 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-13 10:32:01,773 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 10:32:20,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2024-08-13 10:32:22,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2122330.0, ans=0.0 2024-08-13 10:32:27,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.387e+01 2.545e+01 2.935e+01 6.659e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-13 10:32:33,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2122430.0, ans=0.0 2024-08-13 10:32:34,606 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9350, loss[loss=0.09179, beats_loss=0.01337, ecapa_loss=0.0001213, whisper_loss=0.07721, over 16476.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001652, whisper_loss=0.09038, over 3908994.73 frames. ], batch size: 64, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:32:41,832 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 10:32:42,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2122430.0, ans=0.2 2024-08-13 10:32:49,570 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 10:32:50,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2122530.0, ans=0.125 2024-08-13 10:33:03,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2122530.0, ans=0.125 2024-08-13 10:33:20,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2122630.0, ans=0.125 2024-08-13 10:33:21,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2122630.0, ans=0.2 2024-08-13 10:33:26,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-08-13 10:33:30,271 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 10:33:43,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.27 vs. limit=6.0 2024-08-13 10:33:55,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9400, loss[loss=0.1038, beats_loss=0.01034, ecapa_loss=0.0001834, whisper_loss=0.09166, over 21688.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001651, whisper_loss=0.0908, over 3919449.22 frames. ], batch size: 92, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:33:56,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2122930.0, ans=0.0 2024-08-13 10:34:05,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2122930.0, ans=0.125 2024-08-13 10:34:40,484 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 10:34:49,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2123230.0, ans=0.125 2024-08-13 10:35:11,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.356e+01 2.664e+01 2.978e+01 5.324e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-13 10:35:12,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=2123330.0, ans=10.0 2024-08-13 10:35:17,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9450, loss[loss=0.1019, beats_loss=0.01241, ecapa_loss=0.0001638, whisper_loss=0.08787, over 21214.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001665, whisper_loss=0.09104, over 3852153.11 frames. ], batch size: 91, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:35:26,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2123430.0, ans=0.125 2024-08-13 10:35:29,260 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-13 10:35:33,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2123530.0, ans=0.2 2024-08-13 10:35:37,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.42 vs. limit=15.0 2024-08-13 10:35:43,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2123530.0, ans=0.025 2024-08-13 10:35:49,890 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.749e-01 2024-08-13 10:35:50,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2024-08-13 10:35:57,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2123630.0, ans=0.125 2024-08-13 10:35:57,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2123630.0, ans=0.2 2024-08-13 10:35:59,156 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:35:59,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2123630.0, ans=0.0 2024-08-13 10:36:05,594 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 10:36:07,000 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 10:36:27,600 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 10:36:42,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9500, loss[loss=0.0941, beats_loss=0.01116, ecapa_loss=0.0001311, whisper_loss=0.08163, over 23003.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.0001661, whisper_loss=0.09045, over 3838101.94 frames. ], batch size: 91, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:36:51,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2123930.0, ans=0.125 2024-08-13 10:36:51,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2123930.0, ans=0.0 2024-08-13 10:36:59,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-08-13 10:37:17,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2124030.0, ans=0.0 2024-08-13 10:37:22,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=15.0 2024-08-13 10:37:46,459 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 10:37:54,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2124230.0, ans=0.125 2024-08-13 10:38:26,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2124330.0, ans=0.04949747468305833 2024-08-13 10:38:27,419 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 10:38:28,719 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.383e+01 2.725e+01 3.152e+01 1.098e+02, threshold=5.450e+01, percent-clipped=1.0 2024-08-13 10:38:38,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9550, loss[loss=0.07393, beats_loss=0.01273, ecapa_loss=0.0001523, whisper_loss=0.05967, over 14649.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001666, whisper_loss=0.09025, over 3856557.68 frames. ], batch size: 59, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:38:51,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2124430.0, ans=0.0 2024-08-13 10:39:05,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2124530.0, ans=0.125 2024-08-13 10:39:29,052 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 10:39:36,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2124630.0, ans=0.125 2024-08-13 10:39:42,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2124630.0, ans=0.125 2024-08-13 10:39:47,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=2124730.0, ans=12.0 2024-08-13 10:39:52,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2124730.0, ans=0.0 2024-08-13 10:39:53,674 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 10:40:09,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2124830.0, ans=0.125 2024-08-13 10:40:17,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2124830.0, ans=0.0 2024-08-13 10:40:19,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2124830.0, ans=0.07 2024-08-13 10:40:24,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2124830.0, ans=0.1 2024-08-13 10:40:27,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9600, loss[loss=0.0871, beats_loss=0.01009, ecapa_loss=0.0001929, whisper_loss=0.07509, over 16847.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01078, ecapa_loss=0.0001663, whisper_loss=0.09105, over 3875932.48 frames. ], batch size: 72, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:40:34,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2124930.0, ans=0.0 2024-08-13 10:41:13,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-08-13 10:41:23,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2125230.0, ans=0.1 2024-08-13 10:41:46,709 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 10:41:47,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.442e+01 2.705e+01 2.957e+01 4.182e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-13 10:41:55,724 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9650, loss[loss=0.1113, beats_loss=0.00828, ecapa_loss=0.0001266, whisper_loss=0.1017, over 17205.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001667, whisper_loss=0.09085, over 3861846.20 frames. ], batch size: 60, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:41:58,526 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-13 10:42:02,412 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 10:43:11,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2125830.0, ans=0.125 2024-08-13 10:43:27,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9700, loss[loss=0.09759, beats_loss=0.009453, ecapa_loss=0.0001789, whisper_loss=0.08635, over 15511.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001672, whisper_loss=0.09065, over 3825208.16 frames. ], batch size: 62, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:43:36,042 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-13 10:43:56,191 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 10:43:58,546 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 10:44:14,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2024-08-13 10:44:24,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.79 vs. limit=22.5 2024-08-13 10:45:09,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.435e+01 2.595e+01 3.006e+01 3.939e+01, threshold=5.189e+01, percent-clipped=0.0 2024-08-13 10:45:16,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9750, loss[loss=0.09749, beats_loss=0.008488, ecapa_loss=0.0001735, whisper_loss=0.08726, over 14994.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001668, whisper_loss=0.09115, over 3847949.76 frames. ], batch size: 57, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:45:34,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2126430.0, ans=0.1 2024-08-13 10:45:51,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2024-08-13 10:46:35,449 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 10:47:12,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9800, loss[loss=0.07611, beats_loss=0.0138, ecapa_loss=0.0001872, whisper_loss=0.06043, over 16327.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01082, ecapa_loss=0.0001667, whisper_loss=0.09078, over 3840047.20 frames. ], batch size: 70, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:47:18,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2126930.0, ans=0.125 2024-08-13 10:47:24,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2126930.0, ans=0.2 2024-08-13 10:47:27,059 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-13 10:47:38,795 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 10:47:40,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2127030.0, ans=0.1 2024-08-13 10:48:24,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2127230.0, ans=0.0 2024-08-13 10:48:29,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.72 vs. limit=22.5 2024-08-13 10:48:45,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2024-08-13 10:49:04,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.353e+01 2.628e+01 3.072e+01 7.221e+01, threshold=5.255e+01, percent-clipped=1.0 2024-08-13 10:49:12,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9850, loss[loss=0.1241, beats_loss=0.01056, ecapa_loss=0.0001705, whisper_loss=0.1118, over 22711.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001661, whisper_loss=0.09033, over 3874827.85 frames. ], batch size: 90, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:49:23,294 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-13 10:49:55,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2024-08-13 10:50:04,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2127630.0, ans=0.0 2024-08-13 10:50:06,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2127630.0, ans=0.0 2024-08-13 10:50:13,658 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 10:50:33,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2127730.0, ans=0.0 2024-08-13 10:50:53,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2127830.0, ans=0.2 2024-08-13 10:50:54,802 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 10:50:55,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2127830.0, ans=0.2 2024-08-13 10:51:03,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2127830.0, ans=0.0 2024-08-13 10:51:03,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-13 10:51:05,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9900, loss[loss=0.1099, beats_loss=0.01232, ecapa_loss=0.0001681, whisper_loss=0.09594, over 22472.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0109, ecapa_loss=0.0001655, whisper_loss=0.09105, over 3900162.90 frames. ], batch size: 91, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:51:15,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2127930.0, ans=0.1 2024-08-13 10:51:20,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2128030.0, ans=0.125 2024-08-13 10:51:25,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2128030.0, ans=0.0 2024-08-13 10:51:34,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2128030.0, ans=0.015 2024-08-13 10:51:58,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-08-13 10:52:18,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.402e+01 2.725e+01 3.042e+01 4.728e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 10:52:18,929 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 10:52:23,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 9950, loss[loss=0.09023, beats_loss=0.01157, ecapa_loss=0.0001882, whisper_loss=0.07678, over 18798.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01097, ecapa_loss=0.0001653, whisper_loss=0.0902, over 3876584.28 frames. ], batch size: 79, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:52:56,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2128630.0, ans=0.0 2024-08-13 10:53:06,604 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.731e-03 2024-08-13 10:53:10,862 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 10:53:19,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-13 10:53:39,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.01 vs. limit=22.5 2024-08-13 10:53:42,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10000, loss[loss=0.1149, beats_loss=0.009263, ecapa_loss=0.0001595, whisper_loss=0.1041, over 21220.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01096, ecapa_loss=0.0001656, whisper_loss=0.09036, over 3865503.32 frames. ], batch size: 84, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:53:44,357 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 10:53:54,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2128930.0, ans=0.125 2024-08-13 10:53:56,908 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-13 10:54:00,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2129030.0, ans=0.125 2024-08-13 10:54:03,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2129030.0, ans=0.125 2024-08-13 10:54:20,114 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:54:21,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2129130.0, ans=0.1 2024-08-13 10:54:33,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2129230.0, ans=0.125 2024-08-13 10:54:46,927 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 10:54:57,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.402e+01 2.704e+01 2.977e+01 9.053e+01, threshold=5.409e+01, percent-clipped=1.0 2024-08-13 10:55:02,763 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10050, loss[loss=0.08983, beats_loss=0.009462, ecapa_loss=0.000181, whisper_loss=0.07856, over 18294.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0109, ecapa_loss=0.0001654, whisper_loss=0.09048, over 3878553.43 frames. ], batch size: 72, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:55:03,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2129430.0, ans=0.0 2024-08-13 10:55:12,687 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 10:55:13,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2129430.0, ans=0.125 2024-08-13 10:55:13,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2129430.0, ans=0.0 2024-08-13 10:55:30,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2129530.0, ans=0.125 2024-08-13 10:55:36,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2129630.0, ans=0.0 2024-08-13 10:55:41,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=12.0 2024-08-13 10:55:47,636 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 10:55:50,209 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 10:55:55,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-13 10:55:58,688 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 10:56:11,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2129830.0, ans=0.1 2024-08-13 10:56:20,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2129830.0, ans=0.125 2024-08-13 10:56:23,690 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-13 10:56:25,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10100, loss[loss=0.1188, beats_loss=0.007381, ecapa_loss=0.0001878, whisper_loss=0.1096, over 18580.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001657, whisper_loss=0.09107, over 3890098.90 frames. ], batch size: 72, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:56:40,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2130030.0, ans=0.125 2024-08-13 10:56:42,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2130030.0, ans=0.0 2024-08-13 10:56:46,048 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 10:56:46,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2130030.0, ans=0.125 2024-08-13 10:56:56,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2130130.0, ans=0.05 2024-08-13 10:57:02,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2130130.0, ans=0.1 2024-08-13 10:57:04,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-08-13 10:57:11,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-08-13 10:57:30,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2130330.0, ans=0.125 2024-08-13 10:57:41,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2130330.0, ans=0.125 2024-08-13 10:57:42,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.392e+01 2.656e+01 2.956e+01 4.246e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-13 10:57:46,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10150, loss[loss=0.1118, beats_loss=0.01085, ecapa_loss=0.0001672, whisper_loss=0.09927, over 21430.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01091, ecapa_loss=0.0001654, whisper_loss=0.09117, over 3922231.68 frames. ], batch size: 86, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:57:53,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2130430.0, ans=0.0 2024-08-13 10:58:05,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2130530.0, ans=0.0 2024-08-13 10:58:11,460 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 10:58:12,634 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 10:58:20,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2130630.0, ans=10.0 2024-08-13 10:58:27,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2130630.0, ans=0.05 2024-08-13 10:58:32,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-13 10:58:42,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=15.0 2024-08-13 10:58:49,893 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 10:59:06,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10200, loss[loss=0.08538, beats_loss=0.01176, ecapa_loss=0.0001713, whisper_loss=0.07191, over 18158.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.000165, whisper_loss=0.09115, over 3921551.94 frames. ], batch size: 74, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:59:27,907 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 10:59:33,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2024-08-13 10:59:35,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2131030.0, ans=0.1 2024-08-13 10:59:56,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2131230.0, ans=0.0 2024-08-13 11:00:00,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2131230.0, ans=0.125 2024-08-13 11:00:16,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2131330.0, ans=0.0 2024-08-13 11:00:22,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.425e+01 2.688e+01 3.008e+01 5.255e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 11:00:27,131 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10250, loss[loss=0.08806, beats_loss=0.01136, ecapa_loss=0.0001653, whisper_loss=0.07505, over 15142.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001645, whisper_loss=0.09129, over 3902266.83 frames. ], batch size: 58, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:00:43,370 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 11:00:47,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2131530.0, ans=0.2 2024-08-13 11:00:49,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2131530.0, ans=0.125 2024-08-13 11:01:14,724 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 10 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 11:01:16,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2131730.0, ans=0.0 2024-08-13 11:01:16,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2131730.0, ans=0.1 2024-08-13 11:01:26,581 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 24 from Vox, 15 fro AS 2024-08-13 11:01:30,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2131730.0, ans=0.07 2024-08-13 11:01:46,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2131830.0, ans=0.1 2024-08-13 11:01:46,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2024-08-13 11:01:49,437 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10300, loss[loss=0.09787, beats_loss=0.009703, ecapa_loss=0.0001625, whisper_loss=0.08655, over 21725.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001663, whisper_loss=0.09124, over 3916927.08 frames. ], batch size: 87, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:01:50,786 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-13 11:01:55,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2131930.0, ans=0.125 2024-08-13 11:02:11,659 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 11:02:15,002 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 11:02:21,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2024-08-13 11:02:25,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2132130.0, ans=0.0 2024-08-13 11:02:40,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2132230.0, ans=0.0 2024-08-13 11:02:57,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2132330.0, ans=0.125 2024-08-13 11:03:03,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.417e+01 2.741e+01 3.040e+01 4.375e+02, threshold=5.481e+01, percent-clipped=2.0 2024-08-13 11:03:07,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10350, loss[loss=0.1163, beats_loss=0.01074, ecapa_loss=0.0001055, whisper_loss=0.1045, over 15079.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001665, whisper_loss=0.09104, over 3925862.45 frames. ], batch size: 55, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:03:15,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2132430.0, ans=0.0 2024-08-13 11:03:41,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-13 11:04:24,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10400, loss[loss=0.1163, beats_loss=0.00746, ecapa_loss=0.0002215, whisper_loss=0.1066, over 17121.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001651, whisper_loss=0.09082, over 3889506.18 frames. ], batch size: 69, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:04:30,482 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 11:04:45,229 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 11:04:57,810 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 11:05:00,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2133130.0, ans=0.125 2024-08-13 11:05:01,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2133130.0, ans=0.0 2024-08-13 11:05:02,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2133130.0, ans=0.1 2024-08-13 11:05:13,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2133230.0, ans=0.125 2024-08-13 11:05:20,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2133230.0, ans=0.0 2024-08-13 11:05:37,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.409e+01 2.723e+01 2.969e+01 5.956e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 11:05:42,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10450, loss[loss=0.1011, beats_loss=0.01226, ecapa_loss=0.0001589, whisper_loss=0.08723, over 21140.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001664, whisper_loss=0.09059, over 3845464.51 frames. ], batch size: 85, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:05:42,569 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 11:05:49,810 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 11:05:52,779 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 11:05:58,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2133530.0, ans=0.125 2024-08-13 11:05:59,539 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 11:06:13,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2133630.0, ans=0.125 2024-08-13 11:06:20,350 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-13 11:06:23,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2133630.0, ans=0.2 2024-08-13 11:06:24,929 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-13 11:06:49,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2133830.0, ans=0.125 2024-08-13 11:06:53,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2133830.0, ans=0.0 2024-08-13 11:06:58,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10500, loss[loss=0.07273, beats_loss=0.01208, ecapa_loss=0.0001671, whisper_loss=0.05898, over 20745.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001671, whisper_loss=0.09077, over 3845368.92 frames. ], batch size: 88, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:07:06,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2133930.0, ans=0.125 2024-08-13 11:07:06,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2133930.0, ans=22.5 2024-08-13 11:07:11,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2133930.0, ans=0.125 2024-08-13 11:07:12,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=22.5 2024-08-13 11:07:16,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2134030.0, ans=0.125 2024-08-13 11:07:23,510 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 11:07:31,123 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 11:07:39,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2134130.0, ans=0.1 2024-08-13 11:08:08,300 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-13 11:08:12,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.440e+01 2.652e+01 2.992e+01 8.819e+01, threshold=5.304e+01, percent-clipped=1.0 2024-08-13 11:08:17,227 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10550, loss[loss=0.09105, beats_loss=0.01163, ecapa_loss=0.0002077, whisper_loss=0.07734, over 20842.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001668, whisper_loss=0.09031, over 3827048.29 frames. ], batch size: 92, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:08:37,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2134530.0, ans=0.0 2024-08-13 11:08:51,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-08-13 11:09:07,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2134730.0, ans=0.95 2024-08-13 11:09:13,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2134730.0, ans=0.0 2024-08-13 11:09:13,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-13 11:09:37,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2134930.0, ans=0.125 2024-08-13 11:09:38,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10600, loss[loss=0.09998, beats_loss=0.009525, ecapa_loss=0.0001747, whisper_loss=0.08871, over 15554.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.000166, whisper_loss=0.09071, over 3855730.46 frames. ], batch size: 63, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:09:52,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2135030.0, ans=0.125 2024-08-13 11:10:30,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2135230.0, ans=0.1 2024-08-13 11:10:36,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2135230.0, ans=0.0 2024-08-13 11:10:46,689 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 11:10:52,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.455e+01 2.918e+01 3.137e+01 4.464e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-13 11:10:57,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10650, loss[loss=0.1188, beats_loss=0.01158, ecapa_loss=0.000159, whisper_loss=0.1057, over 22439.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001648, whisper_loss=0.09171, over 3891658.31 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:11:25,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2135530.0, ans=0.125 2024-08-13 11:12:00,349 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 11:12:15,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10700, loss[loss=0.1113, beats_loss=0.01027, ecapa_loss=0.0001846, whisper_loss=0.09918, over 20809.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001642, whisper_loss=0.09152, over 3897514.83 frames. ], batch size: 83, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:12:34,674 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 11:12:35,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-13 11:12:37,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2136030.0, ans=0.125 2024-08-13 11:12:40,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2136030.0, ans=0.025 2024-08-13 11:12:43,194 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 11:12:46,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2136130.0, ans=0.125 2024-08-13 11:12:49,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2136130.0, ans=0.125 2024-08-13 11:13:10,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2136230.0, ans=0.125 2024-08-13 11:13:15,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-08-13 11:13:26,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.457e+01 2.823e+01 3.286e+01 3.691e+02, threshold=5.645e+01, percent-clipped=1.0 2024-08-13 11:13:31,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10750, loss[loss=0.131, beats_loss=0.00809, ecapa_loss=0.0002022, whisper_loss=0.1209, over 13580.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001642, whisper_loss=0.09156, over 3894222.16 frames. ], batch size: 54, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:13:34,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2136430.0, ans=0.1 2024-08-13 11:13:38,917 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 11:13:44,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2136430.0, ans=0.125 2024-08-13 11:13:51,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2136530.0, ans=0.125 2024-08-13 11:14:06,035 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-13 11:14:10,025 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 11:14:15,353 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 11:14:21,587 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 11:14:30,566 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 11:14:37,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2136830.0, ans=0.125 2024-08-13 11:14:47,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10800, loss[loss=0.1055, beats_loss=0.009861, ecapa_loss=0.0001659, whisper_loss=0.09394, over 17601.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.0001648, whisper_loss=0.09214, over 3907489.95 frames. ], batch size: 69, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:15:08,933 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 11:15:17,907 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 11:15:45,179 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 11:15:49,261 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 11:15:49,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2137330.0, ans=0.1 2024-08-13 11:15:52,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2137330.0, ans=0.04949747468305833 2024-08-13 11:15:56,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.544e+01 2.753e+01 3.369e+01 1.648e+02, threshold=5.506e+01, percent-clipped=4.0 2024-08-13 11:16:00,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10850, loss[loss=0.09575, beats_loss=0.01361, ecapa_loss=0.0001426, whisper_loss=0.08071, over 15451.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0109, ecapa_loss=0.0001653, whisper_loss=0.09268, over 3909768.31 frames. ], batch size: 60, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:16:01,066 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 11:16:12,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-13 11:16:22,335 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 11:16:37,132 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 11:16:48,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2137730.0, ans=0.0 2024-08-13 11:16:51,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2137730.0, ans=0.0 2024-08-13 11:17:16,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10900, loss[loss=0.09537, beats_loss=0.0136, ecapa_loss=0.0001392, whisper_loss=0.08038, over 22866.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01087, ecapa_loss=0.0001653, whisper_loss=0.09274, over 3910743.67 frames. ], batch size: 93, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:17:24,649 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 11:17:34,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2138030.0, ans=0.125 2024-08-13 11:18:13,904 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 11:18:19,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2138330.0, ans=0.0 2024-08-13 11:18:26,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.488e+01 2.800e+01 3.283e+01 5.415e+01, threshold=5.600e+01, percent-clipped=0.0 2024-08-13 11:18:27,802 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-13 11:18:30,681 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 10950, loss[loss=0.1238, beats_loss=0.01026, ecapa_loss=0.0001565, whisper_loss=0.112, over 22666.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001655, whisper_loss=0.09254, over 3894159.41 frames. ], batch size: 88, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:18:34,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2138430.0, ans=0.125 2024-08-13 11:18:36,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2138430.0, ans=0.125 2024-08-13 11:18:42,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2138430.0, ans=6.0 2024-08-13 11:18:44,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2138430.0, ans=0.1 2024-08-13 11:18:45,367 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 11:18:48,583 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 11:19:15,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2024-08-13 11:19:24,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2138730.0, ans=0.1 2024-08-13 11:19:32,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2138830.0, ans=0.1 2024-08-13 11:19:41,106 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 11:19:48,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11000, loss[loss=0.09594, beats_loss=0.01254, ecapa_loss=0.0001445, whisper_loss=0.08196, over 21098.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01076, ecapa_loss=0.0001659, whisper_loss=0.09243, over 3921965.31 frames. ], batch size: 88, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:20:06,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2139030.0, ans=0.125 2024-08-13 11:20:11,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2024-08-13 11:20:14,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2139030.0, ans=10.0 2024-08-13 11:20:15,171 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 11:20:47,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2139330.0, ans=0.0 2024-08-13 11:20:50,322 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 20 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-13 11:20:54,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2139330.0, ans=0.125 2024-08-13 11:20:58,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.481e+01 2.729e+01 3.286e+01 1.330e+02, threshold=5.458e+01, percent-clipped=4.0 2024-08-13 11:21:03,178 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11050, loss[loss=0.1062, beats_loss=0.01005, ecapa_loss=0.0001492, whisper_loss=0.09461, over 18880.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001661, whisper_loss=0.09173, over 3906714.19 frames. ], batch size: 71, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:21:04,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2139430.0, ans=0.0 2024-08-13 11:21:08,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2139430.0, ans=0.125 2024-08-13 11:21:21,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2139530.0, ans=0.125 2024-08-13 11:21:34,352 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 11:22:06,142 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-13 11:22:10,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2139730.0, ans=0.0 2024-08-13 11:22:18,700 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 11:22:38,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11100, loss[loss=0.1044, beats_loss=0.01028, ecapa_loss=0.0001949, whisper_loss=0.09222, over 22704.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01074, ecapa_loss=0.0001662, whisper_loss=0.09246, over 3923737.84 frames. ], batch size: 93, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:22:59,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2140030.0, ans=0.125 2024-08-13 11:23:21,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2140130.0, ans=0.0 2024-08-13 11:23:32,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2140130.0, ans=0.0 2024-08-13 11:23:38,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2140230.0, ans=0.1 2024-08-13 11:24:11,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.487e+01 2.717e+01 3.069e+01 5.884e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 11:24:16,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11150, loss[loss=0.1051, beats_loss=0.0115, ecapa_loss=0.0001735, whisper_loss=0.0919, over 13555.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01073, ecapa_loss=0.0001662, whisper_loss=0.09216, over 3885692.98 frames. ], batch size: 56, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:24:21,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2140430.0, ans=0.125 2024-08-13 11:25:02,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2140730.0, ans=0.125 2024-08-13 11:25:12,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2140730.0, ans=0.2 2024-08-13 11:25:30,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11200, loss[loss=0.1074, beats_loss=0.01166, ecapa_loss=0.0001645, whisper_loss=0.09405, over 17858.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01067, ecapa_loss=0.0001665, whisper_loss=0.09229, over 3866261.26 frames. ], batch size: 71, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:25:34,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2140930.0, ans=0.0 2024-08-13 11:25:43,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2141030.0, ans=0.125 2024-08-13 11:26:17,423 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 11:26:24,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2141230.0, ans=0.125 2024-08-13 11:26:30,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2141330.0, ans=0.125 2024-08-13 11:26:39,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.420e+01 2.628e+01 2.915e+01 3.904e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-13 11:26:43,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11250, loss[loss=0.1069, beats_loss=0.009381, ecapa_loss=0.0001759, whisper_loss=0.09574, over 22107.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01074, ecapa_loss=0.0001652, whisper_loss=0.09212, over 3871567.99 frames. ], batch size: 88, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:26:51,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2141430.0, ans=0.1 2024-08-13 11:27:11,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2141530.0, ans=0.0 2024-08-13 11:27:19,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2141630.0, ans=0.125 2024-08-13 11:27:21,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2141630.0, ans=0.125 2024-08-13 11:27:27,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-13 11:27:31,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2141730.0, ans=0.125 2024-08-13 11:27:33,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2141730.0, ans=0.0 2024-08-13 11:27:43,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2141830.0, ans=0.0 2024-08-13 11:27:45,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.74 vs. limit=5.0 2024-08-13 11:27:49,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2141830.0, ans=0.0 2024-08-13 11:27:56,462 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 11:27:57,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11300, loss[loss=0.1118, beats_loss=0.01003, ecapa_loss=0.0001618, whisper_loss=0.1001, over 23160.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001643, whisper_loss=0.09183, over 3880796.59 frames. ], batch size: 91, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:28:06,604 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 11:28:08,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2141930.0, ans=0.125 2024-08-13 11:28:13,969 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 11:28:14,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2142030.0, ans=0.0 2024-08-13 11:28:25,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2142030.0, ans=0.125 2024-08-13 11:28:29,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2142130.0, ans=0.07 2024-08-13 11:28:45,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2142230.0, ans=0.0 2024-08-13 11:28:57,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2142330.0, ans=0.2 2024-08-13 11:29:06,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.518e+01 2.742e+01 3.086e+01 4.928e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-13 11:29:10,339 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-13 11:29:11,387 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11350, loss[loss=0.1041, beats_loss=0.009752, ecapa_loss=0.0002102, whisper_loss=0.09222, over 21215.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.000165, whisper_loss=0.09191, over 3898430.01 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:29:20,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2142430.0, ans=0.1 2024-08-13 11:29:25,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.24 vs. limit=22.5 2024-08-13 11:29:38,800 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.751e-03 2024-08-13 11:29:39,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2142630.0, ans=0.0 2024-08-13 11:29:41,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2142630.0, ans=0.125 2024-08-13 11:29:47,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2142630.0, ans=0.125 2024-08-13 11:30:04,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2142730.0, ans=0.1 2024-08-13 11:30:20,783 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:30:23,210 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 11:30:25,370 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11400, loss[loss=0.07389, beats_loss=0.009287, ecapa_loss=0.0001465, whisper_loss=0.06314, over 14283.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01074, ecapa_loss=0.0001647, whisper_loss=0.09209, over 3869316.34 frames. ], batch size: 53, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:30:31,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2142930.0, ans=0.1 2024-08-13 11:30:54,135 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 11:31:12,945 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 11:31:16,706 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 11:31:19,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-13 11:31:36,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.547e+01 2.847e+01 3.262e+01 4.632e+01, threshold=5.695e+01, percent-clipped=0.0 2024-08-13 11:31:42,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11450, loss[loss=0.09828, beats_loss=0.01296, ecapa_loss=0.0001649, whisper_loss=0.08367, over 21839.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01071, ecapa_loss=0.0001654, whisper_loss=0.0922, over 3888610.46 frames. ], batch size: 88, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:31:59,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2024-08-13 11:32:01,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2143530.0, ans=0.0 2024-08-13 11:32:37,596 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.586e+01 2024-08-13 11:32:43,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-13 11:32:46,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.67 vs. limit=22.5 2024-08-13 11:32:47,266 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 11:32:49,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2143830.0, ans=0.125 2024-08-13 11:33:00,286 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11500, loss[loss=0.08609, beats_loss=0.01235, ecapa_loss=0.0001752, whisper_loss=0.07198, over 16951.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01069, ecapa_loss=0.0001653, whisper_loss=0.09244, over 3885893.31 frames. ], batch size: 71, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:33:11,513 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-13 11:33:15,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2144030.0, ans=0.125 2024-08-13 11:33:34,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2144130.0, ans=0.125 2024-08-13 11:33:38,837 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 11:33:41,632 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 11:33:58,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2024-08-13 11:34:03,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2144330.0, ans=0.04949747468305833 2024-08-13 11:34:05,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-08-13 11:34:08,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2144330.0, ans=0.2 2024-08-13 11:34:10,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.467e+01 2.720e+01 3.175e+01 4.456e+01, threshold=5.439e+01, percent-clipped=0.0 2024-08-13 11:34:14,713 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11550, loss[loss=0.1188, beats_loss=0.009308, ecapa_loss=0.0001638, whisper_loss=0.1078, over 22649.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0106, ecapa_loss=0.0001665, whisper_loss=0.0935, over 3893306.06 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:34:16,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2144430.0, ans=0.125 2024-08-13 11:34:27,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2144430.0, ans=0.0 2024-08-13 11:34:37,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-13 11:34:38,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2144530.0, ans=0.125 2024-08-13 11:34:47,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2144630.0, ans=0.125 2024-08-13 11:34:55,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2144630.0, ans=0.125 2024-08-13 11:34:56,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2024-08-13 11:35:26,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2144830.0, ans=0.125 2024-08-13 11:35:29,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11600, loss[loss=0.1164, beats_loss=0.009203, ecapa_loss=0.0002261, whisper_loss=0.1049, over 21071.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01062, ecapa_loss=0.0001666, whisper_loss=0.09332, over 3916128.53 frames. ], batch size: 92, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:35:36,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2144930.0, ans=0.125 2024-08-13 11:35:43,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2145030.0, ans=0.125 2024-08-13 11:35:49,999 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 11:36:04,117 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 11:36:08,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2145130.0, ans=0.125 2024-08-13 11:36:12,130 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-13 11:36:19,111 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 11:36:30,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2145330.0, ans=0.125 2024-08-13 11:36:31,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2145330.0, ans=0.2 2024-08-13 11:36:34,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-13 11:36:35,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2145330.0, ans=0.2 2024-08-13 11:36:37,794 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.430e+01 2.771e+01 3.076e+01 5.105e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 11:36:41,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2024-08-13 11:36:41,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11650, loss[loss=0.09884, beats_loss=0.01708, ecapa_loss=0.0001572, whisper_loss=0.08019, over 16885.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01065, ecapa_loss=0.0001659, whisper_loss=0.09287, over 3899314.26 frames. ], batch size: 68, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:36:43,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-13 11:37:02,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2145530.0, ans=0.125 2024-08-13 11:37:17,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2145630.0, ans=0.07 2024-08-13 11:37:17,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2145630.0, ans=0.125 2024-08-13 11:37:20,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=12.0 2024-08-13 11:37:21,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2145630.0, ans=0.125 2024-08-13 11:37:26,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2145730.0, ans=0.0 2024-08-13 11:37:51,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2145830.0, ans=0.1 2024-08-13 11:37:57,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11700, loss[loss=0.1361, beats_loss=0.008111, ecapa_loss=0.0002062, whisper_loss=0.126, over 16873.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01076, ecapa_loss=0.0001658, whisper_loss=0.09252, over 3922413.58 frames. ], batch size: 67, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:38:06,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.68 vs. limit=10.0 2024-08-13 11:38:54,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2146230.0, ans=0.125 2024-08-13 11:38:56,888 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 11:39:07,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.516e+01 2.793e+01 3.243e+01 6.496e+01, threshold=5.587e+01, percent-clipped=2.0 2024-08-13 11:39:11,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11750, loss[loss=0.09838, beats_loss=0.01096, ecapa_loss=0.0001443, whisper_loss=0.08597, over 20133.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01086, ecapa_loss=0.0001648, whisper_loss=0.09203, over 3926947.84 frames. ], batch size: 80, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:39:19,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2146430.0, ans=0.0 2024-08-13 11:40:07,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2146730.0, ans=0.0 2024-08-13 11:40:08,296 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-13 11:40:17,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2024-08-13 11:40:18,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2146830.0, ans=0.2 2024-08-13 11:40:23,404 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11800, loss[loss=0.1105, beats_loss=0.009229, ecapa_loss=0.0001485, whisper_loss=0.09976, over 15937.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001647, whisper_loss=0.09244, over 3939288.34 frames. ], batch size: 61, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:40:27,369 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 11:40:29,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2146930.0, ans=0.1 2024-08-13 11:40:30,137 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 11:40:41,610 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 11:40:42,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.42 vs. limit=10.0 2024-08-13 11:40:44,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2147030.0, ans=0.0 2024-08-13 11:40:54,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2147130.0, ans=0.0 2024-08-13 11:40:59,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2147130.0, ans=0.0 2024-08-13 11:41:29,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.422e+01 2.679e+01 2.998e+01 8.058e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-13 11:41:30,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2147330.0, ans=0.125 2024-08-13 11:41:33,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11850, loss[loss=0.0748, beats_loss=0.0147, ecapa_loss=0.0001485, whisper_loss=0.05862, over 18951.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001654, whisper_loss=0.09182, over 3944483.69 frames. ], batch size: 79, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:41:37,531 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 11:42:05,780 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 11:42:16,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.14 vs. limit=22.5 2024-08-13 11:42:31,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2147830.0, ans=0.0 2024-08-13 11:42:33,741 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 11:42:42,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11900, loss[loss=0.1128, beats_loss=0.01156, ecapa_loss=0.0001526, whisper_loss=0.09975, over 21963.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01086, ecapa_loss=0.0001645, whisper_loss=0.09233, over 3960477.86 frames. ], batch size: 86, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:42:44,500 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 11:42:46,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2147930.0, ans=0.0 2024-08-13 11:42:55,270 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 11:43:21,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2148130.0, ans=0.0 2024-08-13 11:43:24,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2148230.0, ans=0.1 2024-08-13 11:43:27,017 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 11:43:29,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2148230.0, ans=0.1 2024-08-13 11:43:43,265 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-13 11:43:45,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2148330.0, ans=0.2 2024-08-13 11:43:47,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.622e+01 2.921e+01 5.658e+01, threshold=5.245e+01, percent-clipped=1.0 2024-08-13 11:43:51,902 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 11950, loss[loss=0.1144, beats_loss=0.009496, ecapa_loss=0.000194, whisper_loss=0.1029, over 19681.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01079, ecapa_loss=0.0001644, whisper_loss=0.09282, over 3914795.49 frames. ], batch size: 78, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:44:03,552 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 11:44:05,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2148530.0, ans=0.0 2024-08-13 11:44:14,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2148530.0, ans=0.125 2024-08-13 11:44:39,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2148730.0, ans=0.2 2024-08-13 11:44:44,168 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 11:44:53,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2148830.0, ans=0.125 2024-08-13 11:44:57,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12000, loss[loss=0.1137, beats_loss=0.009449, ecapa_loss=0.0001826, whisper_loss=0.1025, over 20402.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01082, ecapa_loss=0.0001644, whisper_loss=0.09215, over 3887207.38 frames. ], batch size: 82, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:44:57,276 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 11:45:36,637 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005616, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 11:45:55,749 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on SV_voxceleb1: loss=0.004517, beats_loss=0, ecapa_loss=0.0004517, whisper_loss=0, over 939242.00 frames. 2024-08-13 11:47:56,532 INFO [train_multi_KD3.py:1149] (3/4) Epoch 15, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 11:47:56,536 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 11:48:12,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-13 11:48:23,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2149130.0, ans=0.125 2024-08-13 11:48:33,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2149130.0, ans=0.125 2024-08-13 11:48:38,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2149230.0, ans=0.0 2024-08-13 11:48:59,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.423e+01 2.671e+01 3.267e+01 7.662e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 11:49:03,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12050, loss[loss=0.09898, beats_loss=0.00991, ecapa_loss=0.0001721, whisper_loss=0.08735, over 16340.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001654, whisper_loss=0.09177, over 3863208.48 frames. ], batch size: 65, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:49:09,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2149430.0, ans=0.125 2024-08-13 11:49:14,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-13 11:49:32,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2149630.0, ans=0.125 2024-08-13 11:49:40,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2024-08-13 11:49:41,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2149730.0, ans=0.0 2024-08-13 11:49:49,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2024-08-13 11:49:51,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2149730.0, ans=0.125 2024-08-13 11:50:07,986 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12100, loss[loss=0.1203, beats_loss=0.009405, ecapa_loss=0.0001425, whisper_loss=0.1095, over 23174.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.000165, whisper_loss=0.09126, over 3876003.71 frames. ], batch size: 89, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:50:08,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-13 11:50:10,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2149930.0, ans=0.0 2024-08-13 11:50:13,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2149930.0, ans=0.2 2024-08-13 11:50:21,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2150030.0, ans=0.1 2024-08-13 11:50:30,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2150030.0, ans=0.125 2024-08-13 11:50:36,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2150130.0, ans=0.125 2024-08-13 11:50:53,371 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 33 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 11:51:08,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.393e+01 2.671e+01 2.986e+01 4.532e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-13 11:51:11,616 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 11:51:12,760 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12150, loss[loss=0.11, beats_loss=0.01075, ecapa_loss=0.0001527, whisper_loss=0.09772, over 15479.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01091, ecapa_loss=0.0001631, whisper_loss=0.09097, over 3872412.94 frames. ], batch size: 61, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:51:18,412 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:51:20,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2150430.0, ans=0.125 2024-08-13 11:51:28,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2150530.0, ans=0.09899494936611666 2024-08-13 11:51:30,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2150530.0, ans=0.05 2024-08-13 11:51:35,812 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 11:52:02,840 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:52:09,049 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-13 11:52:19,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12200, loss[loss=0.1364, beats_loss=0.008214, ecapa_loss=0.0001938, whisper_loss=0.1263, over 18921.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0109, ecapa_loss=0.0001635, whisper_loss=0.09108, over 3860730.17 frames. ], batch size: 75, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:52:22,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-08-13 11:52:30,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2150930.0, ans=0.125 2024-08-13 11:53:08,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-08-13 11:53:11,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2024-08-13 11:53:21,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.490e+01 2.780e+01 3.147e+01 4.927e+01, threshold=5.560e+01, percent-clipped=0.0 2024-08-13 11:53:25,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12250, loss[loss=0.1115, beats_loss=0.009336, ecapa_loss=0.0001266, whisper_loss=0.1009, over 24410.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001627, whisper_loss=0.09161, over 3880962.16 frames. ], batch size: 89, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:53:56,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2151630.0, ans=0.1 2024-08-13 11:54:15,020 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 11:54:16,241 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 11:54:30,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12300, loss[loss=0.1059, beats_loss=0.008809, ecapa_loss=0.0001916, whisper_loss=0.0952, over 14896.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001641, whisper_loss=0.09152, over 3867661.48 frames. ], batch size: 62, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:54:35,016 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 11:54:36,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2151930.0, ans=0.2 2024-08-13 11:54:40,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-13 11:54:41,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2151930.0, ans=0.0 2024-08-13 11:54:41,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2151930.0, ans=0.125 2024-08-13 11:54:51,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2152030.0, ans=0.125 2024-08-13 11:54:53,595 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 11:55:18,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2152230.0, ans=0.125 2024-08-13 11:55:19,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2152230.0, ans=0.0 2024-08-13 11:55:24,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2152330.0, ans=0.125 2024-08-13 11:55:32,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.395e+01 2.675e+01 2.989e+01 4.697e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-13 11:55:33,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-08-13 11:55:36,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12350, loss[loss=0.1218, beats_loss=0.01006, ecapa_loss=0.0001661, whisper_loss=0.11, over 21819.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001644, whisper_loss=0.09182, over 3873359.58 frames. ], batch size: 87, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:55:48,126 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 11:55:52,428 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.562e-02 2024-08-13 11:55:59,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=2152530.0, ans=0.1 2024-08-13 11:56:00,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2152530.0, ans=0.0 2024-08-13 11:56:06,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2152630.0, ans=0.125 2024-08-13 11:56:36,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2152830.0, ans=0.125 2024-08-13 11:56:36,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2152830.0, ans=10.0 2024-08-13 11:56:40,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12400, loss[loss=0.1301, beats_loss=0.0104, ecapa_loss=0.0001583, whisper_loss=0.1181, over 15990.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001637, whisper_loss=0.09127, over 3866825.74 frames. ], batch size: 63, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:56:50,109 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 11:56:50,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2152930.0, ans=0.125 2024-08-13 11:57:00,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2153030.0, ans=0.2 2024-08-13 11:57:08,574 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 11:57:13,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2153130.0, ans=0.125 2024-08-13 11:57:21,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.84 vs. limit=10.0 2024-08-13 11:57:22,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2024-08-13 11:57:37,820 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 11:57:40,583 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-13 11:57:40,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2153330.0, ans=0.125 2024-08-13 11:57:43,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.413e+01 2.568e+01 2.884e+01 5.690e+01, threshold=5.135e+01, percent-clipped=1.0 2024-08-13 11:57:44,933 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 11:57:47,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12450, loss[loss=0.1123, beats_loss=0.01095, ecapa_loss=0.0001584, whisper_loss=0.09973, over 22094.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001641, whisper_loss=0.09145, over 3835870.66 frames. ], batch size: 87, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:57:47,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2153430.0, ans=0.125 2024-08-13 11:57:52,683 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 11:57:55,151 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 11:57:58,970 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 11:58:16,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2153630.0, ans=0.0 2024-08-13 11:58:36,964 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 11:58:45,409 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 11:58:53,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12500, loss[loss=0.07548, beats_loss=0.0101, ecapa_loss=0.000142, whisper_loss=0.06396, over 15894.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001635, whisper_loss=0.0913, over 3819566.26 frames. ], batch size: 62, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:59:02,324 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 11:59:02,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2153930.0, ans=0.0 2024-08-13 11:59:06,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2154030.0, ans=0.0 2024-08-13 11:59:06,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2154030.0, ans=0.125 2024-08-13 11:59:12,958 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 11:59:18,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2154130.0, ans=0.125 2024-08-13 11:59:24,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-13 11:59:33,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2154230.0, ans=22.5 2024-08-13 11:59:48,247 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 11:59:54,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.421e+01 2.658e+01 2.977e+01 4.803e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-13 11:59:58,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12550, loss[loss=0.1066, beats_loss=0.008882, ecapa_loss=0.0001609, whisper_loss=0.09612, over 17108.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01083, ecapa_loss=0.0001622, whisper_loss=0.09087, over 3855037.49 frames. ], batch size: 64, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:59:58,717 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 12:00:05,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2154430.0, ans=0.0 2024-08-13 12:00:29,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2154630.0, ans=0.125 2024-08-13 12:00:41,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2154730.0, ans=0.1 2024-08-13 12:00:43,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2154730.0, ans=0.125 2024-08-13 12:00:50,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2154830.0, ans=0.125 2024-08-13 12:00:56,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2154830.0, ans=0.2 2024-08-13 12:01:04,055 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:01:04,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12600, loss[loss=0.09604, beats_loss=0.009636, ecapa_loss=0.0001912, whisper_loss=0.08449, over 13777.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001623, whisper_loss=0.09066, over 3853334.57 frames. ], batch size: 55, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:01:10,450 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 12:01:10,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2154930.0, ans=0.2 2024-08-13 12:01:16,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2155030.0, ans=0.0 2024-08-13 12:01:18,813 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-13 12:01:20,354 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-13 12:01:25,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2155030.0, ans=0.125 2024-08-13 12:01:36,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2155130.0, ans=0.0 2024-08-13 12:01:38,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2024-08-13 12:01:40,359 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 12:01:42,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.78 vs. limit=22.5 2024-08-13 12:01:56,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2155330.0, ans=0.1 2024-08-13 12:02:04,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2155330.0, ans=0.025 2024-08-13 12:02:06,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.360e+01 2.643e+01 2.873e+01 1.126e+02, threshold=5.286e+01, percent-clipped=2.0 2024-08-13 12:02:10,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12650, loss[loss=0.1029, beats_loss=0.01202, ecapa_loss=0.000146, whisper_loss=0.08946, over 22544.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01095, ecapa_loss=0.0001626, whisper_loss=0.09051, over 3888469.03 frames. ], batch size: 90, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:03:01,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2155830.0, ans=0.125 2024-08-13 12:03:03,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2155830.0, ans=0.125 2024-08-13 12:03:04,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-13 12:03:11,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2155830.0, ans=0.125 2024-08-13 12:03:14,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12700, loss[loss=0.09976, beats_loss=0.01083, ecapa_loss=0.000193, whisper_loss=0.087, over 22436.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01106, ecapa_loss=0.0001623, whisper_loss=0.09016, over 3907062.75 frames. ], batch size: 94, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:03:23,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2155930.0, ans=0.04949747468305833 2024-08-13 12:03:32,202 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 12:03:37,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2156030.0, ans=0.125 2024-08-13 12:03:39,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2156030.0, ans=0.125 2024-08-13 12:03:53,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2156230.0, ans=0.1 2024-08-13 12:03:58,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2156230.0, ans=0.2 2024-08-13 12:03:59,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2156230.0, ans=0.0 2024-08-13 12:04:16,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2156330.0, ans=0.125 2024-08-13 12:04:17,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.422e+01 2.694e+01 3.051e+01 5.714e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-13 12:04:19,924 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12750, loss[loss=0.08831, beats_loss=0.01247, ecapa_loss=0.0001555, whisper_loss=0.07428, over 19572.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01104, ecapa_loss=0.0001615, whisper_loss=0.09058, over 3901600.49 frames. ], batch size: 80, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:04:31,833 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 12:04:37,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2156530.0, ans=0.125 2024-08-13 12:04:37,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2156530.0, ans=0.05 2024-08-13 12:04:43,818 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 12:04:51,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2156630.0, ans=0.125 2024-08-13 12:04:58,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2156730.0, ans=0.0 2024-08-13 12:05:05,131 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 12:05:09,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2156730.0, ans=0.0 2024-08-13 12:05:16,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2156830.0, ans=0.125 2024-08-13 12:05:27,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12800, loss[loss=0.1053, beats_loss=0.01311, ecapa_loss=0.0001909, whisper_loss=0.09029, over 16084.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.011, ecapa_loss=0.0001641, whisper_loss=0.09099, over 3878860.80 frames. ], batch size: 69, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:05:46,061 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 12:05:50,119 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 12:05:53,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.76 vs. limit=10.0 2024-08-13 12:06:04,552 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 12:06:08,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=12.0 2024-08-13 12:06:17,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2157230.0, ans=0.125 2024-08-13 12:06:22,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2157330.0, ans=0.0 2024-08-13 12:06:34,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.329e+01 2.579e+01 3.123e+01 7.384e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-13 12:06:37,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12850, loss[loss=0.1109, beats_loss=0.009282, ecapa_loss=0.0002102, whisper_loss=0.09954, over 16559.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.0001647, whisper_loss=0.09098, over 3858372.10 frames. ], batch size: 70, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:06:40,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2157430.0, ans=0.125 2024-08-13 12:07:08,093 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 12:07:15,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2024-08-13 12:07:33,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2157730.0, ans=0.2 2024-08-13 12:07:39,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-08-13 12:07:49,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12900, loss[loss=0.1069, beats_loss=0.01019, ecapa_loss=0.0001587, whisper_loss=0.09509, over 17216.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01098, ecapa_loss=0.0001641, whisper_loss=0.09052, over 3855962.69 frames. ], batch size: 68, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:08:03,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-13 12:08:13,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2158030.0, ans=0.09899494936611666 2024-08-13 12:08:39,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2158230.0, ans=0.125 2024-08-13 12:08:46,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2158230.0, ans=0.0 2024-08-13 12:08:55,930 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:08:57,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2158330.0, ans=0.0 2024-08-13 12:09:01,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.397e+01 2.771e+01 3.216e+01 4.644e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 12:09:02,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2158330.0, ans=0.5 2024-08-13 12:09:05,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 12950, loss[loss=0.1396, beats_loss=0.005094, ecapa_loss=0.0001802, whisper_loss=0.1327, over 15681.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001649, whisper_loss=0.0912, over 3836824.62 frames. ], batch size: 59, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:09:30,977 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 12:09:32,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2158530.0, ans=0.125 2024-08-13 12:09:33,067 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 12:09:33,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2158630.0, ans=0.125 2024-08-13 12:09:40,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2158630.0, ans=0.2 2024-08-13 12:09:41,684 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 12:10:01,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2158730.0, ans=0.125 2024-08-13 12:10:05,986 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-13 12:10:13,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2158830.0, ans=0.125 2024-08-13 12:10:16,855 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:10:19,332 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13000, loss[loss=0.08194, beats_loss=0.01129, ecapa_loss=0.0001608, whisper_loss=0.06904, over 17737.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.000164, whisper_loss=0.0907, over 3814072.93 frames. ], batch size: 73, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:10:26,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2158930.0, ans=0.125 2024-08-13 12:10:29,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2158930.0, ans=0.125 2024-08-13 12:10:29,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2158930.0, ans=0.125 2024-08-13 12:10:32,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2159030.0, ans=0.0 2024-08-13 12:10:38,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2159030.0, ans=0.125 2024-08-13 12:11:08,681 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 12:11:30,601 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.395e+01 2.755e+01 3.311e+01 7.767e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-13 12:11:33,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13050, loss[loss=0.114, beats_loss=0.009495, ecapa_loss=0.0001746, whisper_loss=0.1027, over 22431.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01086, ecapa_loss=0.0001643, whisper_loss=0.09018, over 3802634.25 frames. ], batch size: 92, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:11:36,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2024-08-13 12:11:37,446 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 12:11:40,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2159430.0, ans=0.125 2024-08-13 12:11:46,478 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-13 12:11:52,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2159530.0, ans=0.0 2024-08-13 12:11:54,177 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 12:12:03,860 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 12:12:08,628 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 12:12:22,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2024-08-13 12:12:24,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2024-08-13 12:12:32,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2159730.0, ans=0.125 2024-08-13 12:12:32,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2159730.0, ans=0.125 2024-08-13 12:12:42,819 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 12:12:49,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2159930.0, ans=0.125 2024-08-13 12:12:50,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13100, loss[loss=0.1008, beats_loss=0.00945, ecapa_loss=0.0001899, whisper_loss=0.08945, over 22376.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01084, ecapa_loss=0.0001643, whisper_loss=0.09037, over 3808917.52 frames. ], batch size: 91, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:12:56,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2159930.0, ans=0.125 2024-08-13 12:13:11,641 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 12:13:11,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2160030.0, ans=0.125 2024-08-13 12:13:24,483 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 12:13:30,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2160130.0, ans=0.0 2024-08-13 12:13:56,004 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-13 12:13:58,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-08-13 12:14:07,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2160330.0, ans=0.125 2024-08-13 12:14:07,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2160330.0, ans=0.0 2024-08-13 12:14:09,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.515e+01 2.737e+01 3.188e+01 6.948e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 12:14:10,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2160330.0, ans=0.125 2024-08-13 12:14:12,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13150, loss[loss=0.1071, beats_loss=0.008833, ecapa_loss=0.0002247, whisper_loss=0.09598, over 16009.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001647, whisper_loss=0.09022, over 3827808.74 frames. ], batch size: 68, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:14:30,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2160530.0, ans=0.125 2024-08-13 12:14:34,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2160530.0, ans=0.0 2024-08-13 12:14:45,021 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 12:14:47,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2160630.0, ans=0.07 2024-08-13 12:15:01,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2160730.0, ans=0.125 2024-08-13 12:15:28,989 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 12:15:30,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2160830.0, ans=0.0 2024-08-13 12:15:32,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13200, loss[loss=0.1167, beats_loss=0.01051, ecapa_loss=0.0001682, whisper_loss=0.1045, over 22822.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001644, whisper_loss=0.09035, over 3855245.02 frames. ], batch size: 92, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:15:42,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2160930.0, ans=0.125 2024-08-13 12:15:45,676 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 12:16:27,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2161230.0, ans=0.125 2024-08-13 12:16:39,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-13 12:16:49,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-13 12:16:51,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.301e+01 2.587e+01 2.900e+01 9.399e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 12:16:54,507 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13250, loss[loss=0.08394, beats_loss=0.01176, ecapa_loss=0.0001797, whisper_loss=0.07039, over 16160.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01086, ecapa_loss=0.0001657, whisper_loss=0.09023, over 3824519.64 frames. ], batch size: 67, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:16:58,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2161430.0, ans=0.125 2024-08-13 12:17:06,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2161430.0, ans=0.125 2024-08-13 12:17:35,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2161630.0, ans=0.2 2024-08-13 12:17:53,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2161730.0, ans=0.0 2024-08-13 12:17:54,056 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 12:18:07,674 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 12:18:12,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13300, loss[loss=0.09338, beats_loss=0.009644, ecapa_loss=0.0001505, whisper_loss=0.08223, over 14301.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01094, ecapa_loss=0.0001653, whisper_loss=0.08968, over 3816780.88 frames. ], batch size: 57, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:18:15,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-13 12:18:23,401 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 12:18:26,116 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 12:18:54,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2162130.0, ans=0.0 2024-08-13 12:18:59,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2162230.0, ans=0.125 2024-08-13 12:19:10,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2162230.0, ans=0.125 2024-08-13 12:19:20,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-13 12:19:23,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-13 12:19:25,935 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 12:19:29,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.345e+01 2.598e+01 2.972e+01 4.210e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 12:19:33,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13350, loss[loss=0.09335, beats_loss=0.007816, ecapa_loss=0.0001832, whisper_loss=0.08371, over 16450.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001646, whisper_loss=0.09049, over 3803112.53 frames. ], batch size: 63, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:19:36,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2162430.0, ans=0.125 2024-08-13 12:20:14,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2162630.0, ans=0.125 2024-08-13 12:20:30,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2162730.0, ans=0.125 2024-08-13 12:20:31,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2162730.0, ans=0.0 2024-08-13 12:20:45,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2162830.0, ans=0.05 2024-08-13 12:20:50,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13400, loss[loss=0.1191, beats_loss=0.009798, ecapa_loss=0.0001775, whisper_loss=0.1076, over 23068.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001644, whisper_loss=0.0907, over 3791700.68 frames. ], batch size: 93, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:20:51,977 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-13 12:21:04,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2163030.0, ans=0.125 2024-08-13 12:21:24,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-13 12:21:24,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.90 vs. limit=6.0 2024-08-13 12:21:26,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2024-08-13 12:21:32,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2163130.0, ans=0.125 2024-08-13 12:21:33,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2163130.0, ans=0.125 2024-08-13 12:21:35,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2163130.0, ans=0.0 2024-08-13 12:22:06,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.423e+01 2.749e+01 3.162e+01 4.773e+01, threshold=5.498e+01, percent-clipped=0.0 2024-08-13 12:22:07,833 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 12:22:08,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13450, loss[loss=0.1142, beats_loss=0.01087, ecapa_loss=0.0001687, whisper_loss=0.1016, over 22094.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001665, whisper_loss=0.09075, over 3851706.82 frames. ], batch size: 88, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:22:10,025 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 12:22:10,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2163430.0, ans=0.125 2024-08-13 12:22:32,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2163530.0, ans=0.125 2024-08-13 12:22:32,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2163530.0, ans=0.125 2024-08-13 12:22:46,905 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 12:22:50,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-08-13 12:23:13,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2163830.0, ans=0.1 2024-08-13 12:23:16,534 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-13 12:23:23,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2163830.0, ans=0.0 2024-08-13 12:23:26,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13500, loss[loss=0.1057, beats_loss=0.009269, ecapa_loss=0.0001605, whisper_loss=0.09486, over 14659.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001655, whisper_loss=0.09084, over 3841086.71 frames. ], batch size: 56, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:23:29,723 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 12:23:32,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2163930.0, ans=0.0 2024-08-13 12:23:49,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2164030.0, ans=0.04949747468305833 2024-08-13 12:24:10,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2164130.0, ans=0.125 2024-08-13 12:24:10,703 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.465e+00 2024-08-13 12:24:14,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2164230.0, ans=0.125 2024-08-13 12:24:14,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=12.0 2024-08-13 12:24:32,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2164330.0, ans=0.125 2024-08-13 12:24:41,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.326e+01 2.605e+01 3.115e+01 6.571e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-13 12:24:44,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2164430.0, ans=0.125 2024-08-13 12:24:45,031 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13550, loss[loss=0.08842, beats_loss=0.009754, ecapa_loss=0.0001567, whisper_loss=0.0771, over 19918.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001638, whisper_loss=0.09095, over 3857338.98 frames. ], batch size: 81, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:24:45,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2164430.0, ans=0.1 2024-08-13 12:25:11,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2164530.0, ans=0.125 2024-08-13 12:25:42,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2164730.0, ans=0.2 2024-08-13 12:25:49,597 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 12:25:57,459 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 12:26:02,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13600, loss[loss=0.1004, beats_loss=0.01144, ecapa_loss=0.0001398, whisper_loss=0.08758, over 22855.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001636, whisper_loss=0.09169, over 3867089.40 frames. ], batch size: 91, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:26:02,790 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 12:26:07,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=15.0 2024-08-13 12:26:25,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2165030.0, ans=0.1 2024-08-13 12:26:28,001 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 12:26:30,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2165030.0, ans=0.2 2024-08-13 12:26:47,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2165130.0, ans=0.125 2024-08-13 12:26:49,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2165230.0, ans=0.125 2024-08-13 12:26:59,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2165230.0, ans=0.125 2024-08-13 12:27:06,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2024-08-13 12:27:16,395 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 12:27:17,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.537e+01 2.794e+01 3.122e+01 4.623e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 12:27:20,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13650, loss[loss=0.1293, beats_loss=0.006564, ecapa_loss=0.000169, whisper_loss=0.1211, over 23260.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001639, whisper_loss=0.09161, over 3890261.72 frames. ], batch size: 88, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:27:23,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-13 12:27:25,829 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 12:27:30,033 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 12:27:34,473 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 12:27:49,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2165530.0, ans=0.125 2024-08-13 12:28:27,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2165830.0, ans=0.0 2024-08-13 12:28:38,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13700, loss[loss=0.1109, beats_loss=0.006686, ecapa_loss=0.0001982, whisper_loss=0.1023, over 16837.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.0001651, whisper_loss=0.09182, over 3906123.99 frames. ], batch size: 66, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:28:41,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2165930.0, ans=0.1 2024-08-13 12:28:44,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2165930.0, ans=0.125 2024-08-13 12:28:46,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-13 12:28:56,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2166030.0, ans=0.125 2024-08-13 12:29:01,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-08-13 12:29:06,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-13 12:29:21,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2166130.0, ans=0.1 2024-08-13 12:29:28,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-13 12:29:41,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2166330.0, ans=10.0 2024-08-13 12:29:51,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2166330.0, ans=0.1 2024-08-13 12:29:52,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.509e+01 2.844e+01 3.319e+01 7.223e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 12:29:55,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13750, loss[loss=0.1138, beats_loss=0.009579, ecapa_loss=0.0001777, whisper_loss=0.1024, over 18265.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001654, whisper_loss=0.09175, over 3892943.32 frames. ], batch size: 70, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:30:36,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2166630.0, ans=0.0 2024-08-13 12:30:41,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2166730.0, ans=0.1 2024-08-13 12:30:47,178 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 12:30:55,822 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 12:30:59,990 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 16 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 12:31:12,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13800, loss[loss=0.1116, beats_loss=0.008034, ecapa_loss=0.000165, whisper_loss=0.1019, over 16645.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0109, ecapa_loss=0.000164, whisper_loss=0.09131, over 3857886.69 frames. ], batch size: 60, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:31:12,865 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 12:31:20,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2166930.0, ans=0.125 2024-08-13 12:31:52,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2167130.0, ans=0.05 2024-08-13 12:32:01,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-13 12:32:05,089 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 40 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 12:32:05,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2167230.0, ans=0.0 2024-08-13 12:32:18,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-13 12:32:26,651 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.343e+01 2.633e+01 2.825e+01 4.077e+01, threshold=5.266e+01, percent-clipped=0.0 2024-08-13 12:32:30,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13850, loss[loss=0.1059, beats_loss=0.01036, ecapa_loss=0.0001434, whisper_loss=0.09411, over 22281.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0001644, whisper_loss=0.09184, over 3876506.78 frames. ], batch size: 86, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:32:40,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2167430.0, ans=0.2 2024-08-13 12:32:44,635 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 12:32:49,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2167530.0, ans=0.2 2024-08-13 12:32:55,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-13 12:33:17,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2167730.0, ans=0.0 2024-08-13 12:33:23,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2167730.0, ans=0.125 2024-08-13 12:33:23,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2167730.0, ans=0.07 2024-08-13 12:33:47,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13900, loss[loss=0.08521, beats_loss=0.008239, ecapa_loss=0.0001731, whisper_loss=0.07524, over 16824.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01081, ecapa_loss=0.0001642, whisper_loss=0.09192, over 3902536.64 frames. ], batch size: 66, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:33:56,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2024-08-13 12:34:04,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2168030.0, ans=0.1 2024-08-13 12:34:08,273 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 12:34:08,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2168030.0, ans=0.0 2024-08-13 12:34:10,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=12.0 2024-08-13 12:34:23,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2168130.0, ans=0.0 2024-08-13 12:34:32,073 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 10 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 12:34:35,268 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-13 12:34:39,642 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 12:34:44,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.44 vs. limit=22.5 2024-08-13 12:34:53,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2168330.0, ans=0.125 2024-08-13 12:35:02,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.480e+01 2.802e+01 3.173e+01 5.254e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 12:35:05,015 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 13950, loss[loss=0.09866, beats_loss=0.01225, ecapa_loss=0.0001334, whisper_loss=0.08507, over 17256.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.0001647, whisper_loss=0.09188, over 3911477.02 frames. ], batch size: 70, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:35:20,821 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 12:35:22,423 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 12:35:26,857 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 12:35:28,781 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 12:35:29,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=22.5 2024-08-13 12:35:30,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=22.5 2024-08-13 12:35:35,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2168630.0, ans=0.125 2024-08-13 12:35:38,316 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-13 12:35:42,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2168630.0, ans=0.05 2024-08-13 12:36:08,846 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 12:36:21,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2168830.0, ans=0.1 2024-08-13 12:36:31,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14000, loss[loss=0.09049, beats_loss=0.01209, ecapa_loss=0.0001488, whisper_loss=0.07692, over 17120.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001642, whisper_loss=0.09174, over 3924739.79 frames. ], batch size: 68, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:36:42,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-08-13 12:36:43,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2168930.0, ans=0.125 2024-08-13 12:36:43,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2168930.0, ans=0.0 2024-08-13 12:36:45,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2024-08-13 12:37:00,060 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 12:37:16,560 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-13 12:37:22,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2169230.0, ans=0.125 2024-08-13 12:37:35,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2169230.0, ans=0.0 2024-08-13 12:37:42,454 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 12:37:54,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2169330.0, ans=0.1 2024-08-13 12:37:55,809 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-13 12:37:56,785 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.610e+01 2.869e+01 3.326e+01 4.545e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-13 12:38:02,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14050, loss[loss=0.1241, beats_loss=0.009894, ecapa_loss=0.0001636, whisper_loss=0.1126, over 16681.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01074, ecapa_loss=0.0001635, whisper_loss=0.09207, over 3915246.45 frames. ], batch size: 65, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:38:11,715 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 12:38:47,493 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 12:39:03,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-13 12:39:27,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.28 vs. limit=6.0 2024-08-13 12:39:45,977 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 12:39:47,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2169930.0, ans=0.125 2024-08-13 12:39:48,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14100, loss[loss=0.1148, beats_loss=0.01222, ecapa_loss=0.0001496, whisper_loss=0.1011, over 21936.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001636, whisper_loss=0.09108, over 3888767.28 frames. ], batch size: 89, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:40:38,506 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 12:40:59,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=12.0 2024-08-13 12:41:14,064 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-13 12:41:15,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2024-08-13 12:41:23,196 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 12:41:26,716 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 12:41:39,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.420e+01 2.685e+01 3.019e+01 4.436e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 12:41:45,784 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14150, loss[loss=0.106, beats_loss=0.008087, ecapa_loss=0.0002095, whisper_loss=0.09587, over 20201.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0108, ecapa_loss=0.0001642, whisper_loss=0.0907, over 3892831.68 frames. ], batch size: 85, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:42:09,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2170530.0, ans=0.125 2024-08-13 12:42:20,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2170530.0, ans=0.125 2024-08-13 12:43:12,011 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 26 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-13 12:43:14,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2170730.0, ans=0.125 2024-08-13 12:43:34,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2170830.0, ans=0.0 2024-08-13 12:43:34,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2170830.0, ans=0.0 2024-08-13 12:43:46,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14200, loss[loss=0.1165, beats_loss=0.01029, ecapa_loss=0.0001818, whisper_loss=0.1044, over 23222.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001622, whisper_loss=0.09104, over 3925889.21 frames. ], batch size: 96, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:43:55,176 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 21 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-13 12:44:01,553 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 12:44:48,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2171130.0, ans=0.07 2024-08-13 12:45:03,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=8.0 2024-08-13 12:45:15,573 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 12:45:23,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2171330.0, ans=0.0 2024-08-13 12:45:25,408 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 12:45:38,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2171330.0, ans=0.125 2024-08-13 12:45:43,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.405e+01 2.774e+01 3.077e+01 4.390e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 12:45:48,035 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 12:45:49,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14250, loss[loss=0.09007, beats_loss=0.01204, ecapa_loss=0.0001681, whisper_loss=0.07634, over 21318.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001615, whisper_loss=0.09103, over 3926873.95 frames. ], batch size: 88, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:45:49,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2171430.0, ans=0.125 2024-08-13 12:46:05,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2171430.0, ans=0.1 2024-08-13 12:46:12,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2024-08-13 12:46:13,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2024-08-13 12:46:24,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2171530.0, ans=0.0 2024-08-13 12:46:25,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2171630.0, ans=0.125 2024-08-13 12:46:35,626 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.746e+01 2024-08-13 12:46:50,251 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 12:47:01,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2171830.0, ans=0.125 2024-08-13 12:47:13,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14300, loss[loss=0.07627, beats_loss=0.01566, ecapa_loss=0.0001336, whisper_loss=0.05927, over 18922.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001622, whisper_loss=0.09134, over 3932563.23 frames. ], batch size: 78, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:47:13,473 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 12:47:21,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2171930.0, ans=0.125 2024-08-13 12:47:36,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2172030.0, ans=0.125 2024-08-13 12:47:41,084 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 12:47:51,010 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 12:47:56,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2024-08-13 12:48:00,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-13 12:48:12,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2172230.0, ans=0.2 2024-08-13 12:48:19,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2172330.0, ans=0.0 2024-08-13 12:48:23,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2024-08-13 12:48:30,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2172330.0, ans=0.125 2024-08-13 12:48:30,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.493e+01 2.701e+01 3.105e+01 1.229e+02, threshold=5.402e+01, percent-clipped=5.0 2024-08-13 12:48:34,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14350, loss[loss=0.08299, beats_loss=0.01108, ecapa_loss=0.0001708, whisper_loss=0.0702, over 17371.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01088, ecapa_loss=0.0001626, whisper_loss=0.09111, over 3934692.82 frames. ], batch size: 72, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:48:37,859 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-13 12:48:39,699 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 29 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-13 12:48:48,519 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 12:49:02,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2172530.0, ans=0.2 2024-08-13 12:49:07,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2172630.0, ans=0.0 2024-08-13 12:49:09,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2172630.0, ans=0.125 2024-08-13 12:49:22,728 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 12:49:52,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-13 12:49:54,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14400, loss[loss=0.1242, beats_loss=0.01049, ecapa_loss=0.0001479, whisper_loss=0.1122, over 23171.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01094, ecapa_loss=0.0001629, whisper_loss=0.09066, over 3930586.28 frames. ], batch size: 89, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:50:16,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2173030.0, ans=0.125 2024-08-13 12:50:29,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2173130.0, ans=0.2 2024-08-13 12:50:40,737 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 12:50:48,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2173230.0, ans=0.0 2024-08-13 12:50:49,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2173230.0, ans=0.125 2024-08-13 12:50:56,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2173230.0, ans=0.1 2024-08-13 12:50:56,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.65 vs. limit=10.0 2024-08-13 12:51:13,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.394e+01 2.605e+01 2.942e+01 4.760e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 12:51:17,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 15, batch 14450, loss[loss=0.0722, beats_loss=0.01167, ecapa_loss=0.0001783, whisper_loss=0.05875, over 19966.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001644, whisper_loss=0.09079, over 3927044.48 frames. ], batch size: 83, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:51:35,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2173530.0, ans=0.1 2024-08-13 12:51:50,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2173630.0, ans=0.1 2024-08-13 12:51:53,254 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 12:52:00,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2173630.0, ans=0.0 2024-08-13 12:52:46,779 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 12:52:47,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 0, loss[loss=0.09158, beats_loss=0.01052, ecapa_loss=0.0001643, whisper_loss=0.07942, over 17020.00 frames. ], tot_loss[loss=0.09158, beats_loss=0.01052, ecapa_loss=0.0001643, whisper_loss=0.07942, over 17020.00 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:52:47,912 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 12:53:29,387 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005644, whisper_loss=0.2485, over 922467.00 frames. 2024-08-13 12:53:45,268 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on SV_voxceleb1: loss=0.00454, beats_loss=0, ecapa_loss=0.000454, whisper_loss=0, over 939242.00 frames. 2024-08-13 12:54:21,399 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9100, 3.7427, 3.2490, 3.5245], device='cuda:3') 2024-08-13 12:54:34,632 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4905, 2.3014, 2.6357, 1.1893], device='cuda:3') 2024-08-13 12:55:41,391 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 12:55:41,394 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32257MB 2024-08-13 12:56:04,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2024-08-13 12:56:26,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2173910.0, ans=15.0 2024-08-13 12:56:28,843 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:57:18,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2174110.0, ans=0.2 2024-08-13 12:57:47,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 50, loss[loss=0.1116, beats_loss=0.008117, ecapa_loss=0.0002278, whisper_loss=0.1012, over 16294.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01017, ecapa_loss=0.0001657, whisper_loss=0.09089, over 867073.96 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:58:09,604 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 12:58:11,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.946e+01 3.270e+01 5.312e+01, threshold=5.891e+01, percent-clipped=1.0 2024-08-13 12:58:18,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2174410.0, ans=0.2 2024-08-13 12:58:19,042 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-13 12:58:21,124 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 12:58:36,841 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 12:58:41,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2174510.0, ans=0.05 2024-08-13 12:58:41,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2174510.0, ans=0.125 2024-08-13 12:58:53,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2174510.0, ans=0.0 2024-08-13 12:58:57,973 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 12:59:43,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 100, loss[loss=0.1109, beats_loss=0.007553, ecapa_loss=0.0001562, whisper_loss=0.1018, over 15446.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.009805, ecapa_loss=0.0001655, whisper_loss=0.09224, over 1528892.85 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:00:05,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2174810.0, ans=0.125 2024-08-13 13:00:22,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2174910.0, ans=0.125 2024-08-13 13:00:44,484 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-13 13:00:57,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-13 13:00:58,326 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 13:01:08,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2175110.0, ans=0.125 2024-08-13 13:01:27,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-13 13:01:27,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2175210.0, ans=0.125 2024-08-13 13:01:27,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2175210.0, ans=0.0 2024-08-13 13:01:34,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 150, loss[loss=0.08774, beats_loss=0.01178, ecapa_loss=0.0001494, whisper_loss=0.07446, over 19042.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009883, ecapa_loss=0.0001646, whisper_loss=0.09038, over 2030626.39 frames. ], batch size: 75, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:01:52,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.691e+01 2.914e+01 3.205e+01 4.939e+01, threshold=5.827e+01, percent-clipped=0.0 2024-08-13 13:01:53,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2175410.0, ans=0.0 2024-08-13 13:02:01,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2175410.0, ans=0.5 2024-08-13 13:02:27,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 13:02:30,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2175610.0, ans=0.1 2024-08-13 13:02:35,089 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 13:02:38,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 13:02:38,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2175610.0, ans=0.125 2024-08-13 13:02:49,169 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 13:02:57,008 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 200, loss[loss=0.09498, beats_loss=0.008724, ecapa_loss=0.0001852, whisper_loss=0.0844, over 20916.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01008, ecapa_loss=0.0001657, whisper_loss=0.09042, over 2432430.22 frames. ], batch size: 85, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:03:19,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2175910.0, ans=0.125 2024-08-13 13:03:30,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2176010.0, ans=0.125 2024-08-13 13:03:42,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2176110.0, ans=0.125 2024-08-13 13:04:12,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2176310.0, ans=0.5 2024-08-13 13:04:13,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 250, loss[loss=0.08956, beats_loss=0.01029, ecapa_loss=0.0001703, whisper_loss=0.07757, over 18238.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01023, ecapa_loss=0.0001649, whisper_loss=0.09053, over 2703282.60 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:04:16,449 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 13:04:27,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.510e+01 2.289e+01 2.601e+01 2.843e+01 4.467e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 13:04:32,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2176410.0, ans=0.0 2024-08-13 13:04:33,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-08-13 13:04:34,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-13 13:04:41,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2176510.0, ans=0.2 2024-08-13 13:04:44,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2176510.0, ans=0.0 2024-08-13 13:04:44,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2024-08-13 13:04:47,173 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-13 13:05:00,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2176610.0, ans=0.125 2024-08-13 13:05:00,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2176610.0, ans=0.125 2024-08-13 13:05:01,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2176610.0, ans=0.0 2024-08-13 13:05:03,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2176610.0, ans=0.2 2024-08-13 13:05:05,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2176610.0, ans=10.0 2024-08-13 13:05:14,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2176710.0, ans=0.2 2024-08-13 13:05:25,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 300, loss[loss=0.1087, beats_loss=0.01017, ecapa_loss=0.0002143, whisper_loss=0.0964, over 18873.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001651, whisper_loss=0.08982, over 2903570.07 frames. ], batch size: 79, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:05:27,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2176810.0, ans=0.125 2024-08-13 13:05:32,508 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 13:05:42,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-13 13:05:43,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-13 13:05:45,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2176910.0, ans=0.125 2024-08-13 13:05:48,469 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 13:05:56,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2177010.0, ans=0.125 2024-08-13 13:05:58,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2177010.0, ans=0.125 2024-08-13 13:06:07,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2177110.0, ans=0.125 2024-08-13 13:06:09,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2177110.0, ans=0.125 2024-08-13 13:06:19,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2024-08-13 13:06:24,191 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 13:06:27,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2177210.0, ans=0.2 2024-08-13 13:06:34,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-13 13:06:38,057 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 350, loss[loss=0.09468, beats_loss=0.01029, ecapa_loss=0.0001928, whisper_loss=0.08247, over 14530.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001654, whisper_loss=0.08962, over 3098657.78 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:06:40,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2177310.0, ans=0.125 2024-08-13 13:06:52,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.376e+01 2.584e+01 2.917e+01 1.097e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-13 13:07:18,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2177510.0, ans=0.09899494936611666 2024-08-13 13:07:41,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=12.0 2024-08-13 13:07:44,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2177710.0, ans=0.015 2024-08-13 13:07:47,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2177710.0, ans=0.025 2024-08-13 13:07:47,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2177710.0, ans=0.0 2024-08-13 13:07:51,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 400, loss[loss=0.08423, beats_loss=0.009272, ecapa_loss=0.0001168, whisper_loss=0.07379, over 17937.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01053, ecapa_loss=0.0001636, whisper_loss=0.08859, over 3239253.25 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:07:54,319 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 13:08:19,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2178010.0, ans=0.0 2024-08-13 13:08:21,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2178010.0, ans=0.125 2024-08-13 13:08:29,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2178010.0, ans=0.125 2024-08-13 13:08:37,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2178110.0, ans=0.0 2024-08-13 13:08:43,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2178110.0, ans=0.0 2024-08-13 13:09:02,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 450, loss[loss=0.1044, beats_loss=0.008331, ecapa_loss=0.0001603, whisper_loss=0.0945, over 16364.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0106, ecapa_loss=0.0001635, whisper_loss=0.08803, over 3353044.81 frames. ], batch size: 62, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:09:16,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.363e+01 2.643e+01 2.945e+01 6.968e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-13 13:09:19,252 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 13:09:19,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=12.0 2024-08-13 13:09:20,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=15.0 2024-08-13 13:09:20,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2178410.0, ans=0.125 2024-08-13 13:09:31,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.64 vs. limit=22.5 2024-08-13 13:09:34,663 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 13:09:49,576 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 13:09:58,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2178710.0, ans=0.125 2024-08-13 13:09:58,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-13 13:10:01,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2178710.0, ans=0.0 2024-08-13 13:10:14,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 500, loss[loss=0.08579, beats_loss=0.01161, ecapa_loss=0.0001602, whisper_loss=0.07258, over 16786.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001628, whisper_loss=0.08909, over 3476262.54 frames. ], batch size: 65, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:10:17,060 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 32 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 13:10:22,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-13 13:10:51,752 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 13:10:54,539 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 13:11:16,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2179210.0, ans=0.0 2024-08-13 13:11:17,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-08-13 13:11:27,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2179310.0, ans=0.125 2024-08-13 13:11:28,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 550, loss[loss=0.1084, beats_loss=0.009838, ecapa_loss=0.000159, whisper_loss=0.09693, over 22591.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.000162, whisper_loss=0.08988, over 3555101.59 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:11:41,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2024-08-13 13:11:43,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.371e+01 2.596e+01 2.960e+01 4.995e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-13 13:12:00,925 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 13:12:01,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2179510.0, ans=0.1 2024-08-13 13:12:02,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2179510.0, ans=0.0 2024-08-13 13:12:10,340 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 13:12:10,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2179610.0, ans=0.2 2024-08-13 13:12:18,959 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 13:12:34,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2179710.0, ans=0.1 2024-08-13 13:12:40,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 600, loss[loss=0.1065, beats_loss=0.01015, ecapa_loss=0.0001631, whisper_loss=0.09475, over 16518.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001602, whisper_loss=0.09056, over 3637158.84 frames. ], batch size: 65, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:12:43,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2024-08-13 13:12:45,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2179810.0, ans=0.125 2024-08-13 13:12:49,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-13 13:13:02,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2179910.0, ans=0.2 2024-08-13 13:13:06,076 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 13:13:09,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2180010.0, ans=0.125 2024-08-13 13:13:20,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2180010.0, ans=0.125 2024-08-13 13:13:31,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2180110.0, ans=0.2 2024-08-13 13:13:41,000 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 13:13:48,094 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 13:13:52,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2180310.0, ans=0.125 2024-08-13 13:13:53,476 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 650, loss[loss=0.1065, beats_loss=0.01285, ecapa_loss=0.0001609, whisper_loss=0.09201, over 22208.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001588, whisper_loss=0.0903, over 3678533.95 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:13:55,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2180310.0, ans=0.0 2024-08-13 13:14:08,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.791e+01 3.201e+01 6.340e+01, threshold=5.582e+01, percent-clipped=1.0 2024-08-13 13:14:13,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2024-08-13 13:14:21,539 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 13:14:38,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2180610.0, ans=0.0 2024-08-13 13:14:42,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2180610.0, ans=0.125 2024-08-13 13:14:49,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2180610.0, ans=0.125 2024-08-13 13:14:58,823 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 13:15:02,769 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 13:15:06,858 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 700, loss[loss=0.07973, beats_loss=0.01163, ecapa_loss=0.0001574, whisper_loss=0.06653, over 18228.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001601, whisper_loss=0.09062, over 3727418.08 frames. ], batch size: 72, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:15:08,661 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 12 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 13:15:11,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-13 13:15:18,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2180810.0, ans=0.95 2024-08-13 13:15:39,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2181010.0, ans=0.125 2024-08-13 13:15:42,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2181010.0, ans=0.125 2024-08-13 13:15:44,887 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 13:16:16,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2181210.0, ans=0.125 2024-08-13 13:16:19,403 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 13:16:22,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 750, loss[loss=0.1077, beats_loss=0.01266, ecapa_loss=0.0001299, whisper_loss=0.09379, over 19460.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001591, whisper_loss=0.09001, over 3763218.09 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:16:29,003 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 13:16:37,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.319e+01 2.745e+01 2.985e+01 9.286e+01, threshold=5.489e+01, percent-clipped=1.0 2024-08-13 13:16:41,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2181410.0, ans=0.5 2024-08-13 13:16:49,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2181410.0, ans=0.125 2024-08-13 13:17:00,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2181510.0, ans=0.1 2024-08-13 13:17:03,609 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 13:17:37,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 800, loss[loss=0.1091, beats_loss=0.01007, ecapa_loss=0.0001622, whisper_loss=0.09739, over 18521.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01072, ecapa_loss=0.0001599, whisper_loss=0.0899, over 3760919.67 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:17:50,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-13 13:18:03,341 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 13:18:12,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2182010.0, ans=0.0 2024-08-13 13:18:22,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2182110.0, ans=0.0 2024-08-13 13:18:49,237 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 13:18:52,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 850, loss[loss=0.08125, beats_loss=0.01078, ecapa_loss=0.0001612, whisper_loss=0.06885, over 17035.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001598, whisper_loss=0.09003, over 3767285.74 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:19:08,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.299e+01 2.538e+01 2.916e+01 7.643e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-13 13:19:14,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2182410.0, ans=0.035 2024-08-13 13:19:15,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=15.0 2024-08-13 13:19:28,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2182510.0, ans=0.125 2024-08-13 13:19:36,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2182510.0, ans=0.0 2024-08-13 13:19:52,310 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 13:19:54,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=22.5 2024-08-13 13:19:57,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2182710.0, ans=0.2 2024-08-13 13:20:00,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2182710.0, ans=0.125 2024-08-13 13:20:07,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 900, loss[loss=0.09585, beats_loss=0.01295, ecapa_loss=0.0001444, whisper_loss=0.08145, over 23304.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001604, whisper_loss=0.09017, over 3764552.12 frames. ], batch size: 96, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:20:17,690 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-13 13:20:34,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2182910.0, ans=0.125 2024-08-13 13:20:59,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2183110.0, ans=0.0 2024-08-13 13:21:06,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2183110.0, ans=0.0 2024-08-13 13:21:07,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2183110.0, ans=0.125 2024-08-13 13:21:19,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2024-08-13 13:21:24,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2183210.0, ans=0.1 2024-08-13 13:21:26,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2183210.0, ans=0.0 2024-08-13 13:21:35,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 950, loss[loss=0.1157, beats_loss=0.009963, ecapa_loss=0.0001484, whisper_loss=0.1043, over 23130.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01068, ecapa_loss=0.0001599, whisper_loss=0.08964, over 3770499.40 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:21:43,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2183310.0, ans=0.125 2024-08-13 13:21:53,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.345e+01 2.599e+01 2.801e+01 4.371e+01, threshold=5.198e+01, percent-clipped=0.0 2024-08-13 13:21:58,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2183410.0, ans=0.125 2024-08-13 13:22:02,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2183410.0, ans=0.0 2024-08-13 13:22:08,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2183410.0, ans=0.1 2024-08-13 13:22:25,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2183510.0, ans=0.125 2024-08-13 13:22:35,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-08-13 13:22:51,552 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 13:23:05,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2183710.0, ans=0.125 2024-08-13 13:23:14,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1000, loss[loss=0.1133, beats_loss=0.008885, ecapa_loss=0.000171, whisper_loss=0.1027, over 16281.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01078, ecapa_loss=0.0001579, whisper_loss=0.08926, over 3791908.57 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:23:24,083 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 13:23:29,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2183810.0, ans=0.125 2024-08-13 13:23:37,732 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 13:23:59,214 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 13:24:16,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2184010.0, ans=0.0